Speech Denoinsing and Arabic Speaker Recognition

0 downloads 0 Views 1MB Size Report
Speech Denoinsing and Arabic Speaker Recognition System. Using Subband Approach. Abstract - This paper proposes an
Inrernarional Review on Comnprrrers and Sojhvare (I. RE, CCO.S.), C'ol. 2, n. 3 ,4,lay 2007

Speech Denoinsing and Arabic Speaker Recognition System Using Subband Approach

- This paper proposes an < f i c i e ~ i /speech t~ecogtiiriot~ nierhod,for. .-lr.abic latigirage. .-I Hidden .\larkov illotiels baseti speech r.ecognirion sysrenr ~ v a sdesigned nrrd rested ~ v i t larrtoninric i ..lrcibic 1vor.d t.ecognirion. The systerti is an isolaled ~vhole~ v o r dspeech recogni:et arid i l was inrplenre~iredas boll1 t r videb bar id speech signal anti a srrbbnt~dsspectt.al t.ecognirion ~trodes.Ire p a r r i c i r l a r l ~tiisc~iss ~ rhe selectiori of the rnosl critical s~rhbcrtrcisfor the spenker recoguition task nnil [he choice of an oplinral cii~.isioti of /he ,fiecl~retic.veior?initr. An appropt-iare selection of /he trlosr crirical strhbarih sho~v.srlinr very good pet:for.nintices are still ohtoined usirh o t i l ) ~half oftlie ,fr.eqrrericy tlonlaitl. the strategy oftiecisioti rests on the i r i d i ~ ~ i h rciecisiotu al o~t~ecogtii:er~s in each srrbbard This recognition s~~sretrr ocliieved a 89.5% cot.t.ecr 11jor.dr.ecognirion in /he ~videbaticl n~ocle,arid 95.25% iti srrbbaricls mocle. .-I co~npnr.isotibell~vetithe vnt.iorrs vnt.iati/s of ntial.v.~is~ v i l l he made to ohscr.~e[heir perfot.~?rcrticc.s.Copyright 02007 Praise bVortlrj~Prize S.r.1. All rights reserverl. Abstract

-

Ke-v!y,vorrls:.-It.nbic ~~tot.rls, t.ecog/iirioti. speech, .s~rhhtrtrti'.H,\l.\ls

1.

Introduction

Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves. This technique makes it possible to use the speaker's voice to verify their identity and control access to services such as voice dialing, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers. During the last years, numerous efforts o f search were made to improve the performances of recognition system. While Arabic has not been the object of as much linguistic research as to other languages such as English and Japanese, some researcher has been conducted on isolated Arabic word recognition. In 1985, Hagos [ I ] and Abdullah [2] separately reported Arabic word recognizers. Hagos designed a speakerindependent Arabic digit recognition system that used template matching for input utterances. His system is based on the LPC parameters for feature extraction and log likelihood ratio for similarity nieasurements. Abdullah developed another Arabic word recognition system that used positive-slope and zerocrossing duration as the feature extraction algorithni. Systems mentioned above are isolated-word recognition systems, Later, Al-Otaibi [3] developed an automatic Arabic vowel recognition system and isolated Arabic digits recognition systems were implemented. Recently, a new approach for the automatic word recognition, appointed subbands was proposed in [4].

,blc~rrrrscriprreceived atrd revised tlpriI2007,accepred ;\lqv 2007

In a subbands system, only information stemming from the noise frequency band will be degraded those and sternming other bands can be exploited for a better recognition [j],[GI. This study, suggests that hearing system treats word locally in the domain time-frequency before making cleanly said recognition [5],[6],[7]. This method is more resistant in rumours limited to a part of the frequency spectre. Indeed, in a classic systenl which extracts acoustic parameters on the totality of the frequency bands, all the coefficients will be disrupted even though noise occupies only a weak part of the bands. In this study, we consider a new approach to automatic Arabic speech recognition which has the advantage to overcome many limitations of classical wideband systems. The principle of this subband approach is to build an Arabic speech model in the tirne-frequency domain using the formalism of Hidden Markov Models. Arabic word recognizers performance dependency on the choice of frequency bands is investigated for both objective and subjective tests. According to objective test results, it has been shown that 0-1000 Hz and 3000-4500 Hz frequency bands are more significant in speaker discrimination when compared to other frequencies. In particular the paper deals with the following issues: ( I ) The development of the architecture of the subband approach for system recognition process. (2) An HblbI wideband system is implemented to recognise Arabic words. The global model is con~posedof left-to right HbIMs where each model is trained using samples C o ~ ~ r i g8l u2007 Praise bVorrh.v Prize S.r.1. - All rigltrs reserved

Z.Sakka.,3, kbchouri, M.Sarrlet

Markov approach a flexibility that is appealing for represent a phenomenon as complex as the speech production.

from Arabic database. (3) Presentation of the Arabic subband approach system recognition including speech pre-processing and feature extraction. And (4) an illustrative experiments on isolated words subband recognition tasks.

11.

111.1. A.lodellirig of [heSpeech Sigtial

In the goal to achieve isolated Arabic words recognition, we suppose that a machine of Markov represent a word; in a more general case, these models of words are themselves constructed by concatenation of acoustic basis units. The states can be interpreted like configurations of the phonatory device, and the observations given out at the time of the arrival in a state correspond to the acoustic plots. These plots are ordinarily represented by vectors of continuous parameters, and it is necessary either to amount to the case of the discreet broadcast syn~bolsby vectorial quantification, or to modelise the probabilities of broadcast by continuous densities probability.

Subband Model

The subband-based speaker recognition system can be seen as a combination of multiple recognizers (one for each subband) associated to a decision module which performs the recombination of each subband recognizer output. Fig. 1 illustrates the general principle of our subltiband system. Subband models for speech recognition have been proposed in [a]. Some issues involved in designing a subband model can be generalized to speaker recognition: ( 1 ) the architecture of the subband-based system (selection of the lnost critical subbands for the recognition task; optimal division of the whole frequency domain), (2) the recombination of each subband recognizer output (recombination level, recombination strategies, fusion of multiple decisions).

The difficulties encountered for the development of Automatic systems speech recognition come from the variability of the speech signal [ I I], [I?]. Among the developed methods, the statistical approach by Hidden Markov Model (HMM) seems the most efficient. We constituted a system of isolated words recognition given by the Fig. 2. The HMMs model used is a right left model. Every state is defined by an average and a variance, the algorithm of Baum-Welsh to determine a new average and a new variance using the basis of training every time.

111. Models of Hidden Markovs Since their introduction in speech treatment. the models of hidden Markovs took a considerable importance, to the point that the quasi-totality of the automatic speech pattern recognition systems uses this modelling [9],[lo]. The models of hidden Markovs suppose that the phenomenon modelised is an uncertain process and unobservable that appears by broadcast themselves iincei-tain. These two levels give to the

Parole

+

:

Classification

I.._.__.._.._.._.._.._..l

Fig. I . Block diagram of the subband system

Copyright O 2007 Praise Worthy Prize S.r.1. - All rights reserved

International Review on Compttrers and Sq/nuare. Vol. 2. n. 3

Z. Sakka, A. Kachouri, M. Samet

The decoding uses the algorithm of Viterbi in order to find the sequence of states likeliest correspondent to the parameters observed in a composite model in order to deduct the corresponding word. The result of the decoding is compared to the reference labels by dynamic alignment in order to count the identified labels, omitted, and substituted by another, to insert and to calculate the rate recognition.

IV.

System Overview: Wideband Speech Signal

The whole system can be broadly categorized into three main sections: Segmentation and noise elimination, feature extraction using LPCC [9], [13], [I41 and VQ, and recognition using HMM [9], [lo], [IS]. - This system was partitioned into several modules according to their functionality as shown in figure2. First is the digital signal processing front-end module, whose functions are speech acquisition through a microphone, filtering, and sampling. A band-pass filter with cut-off frequencies of 100 Hz and 4.8 kHz was used to filter the speech signal before processing. The sampling rate was set to 16 kHz with 16-bit resolution for all recorded speech tokens. A manual endpoint detection method was used to separate speech from silent portions of the signal. It also detected the beginning and the end points of the spoken word [I 6],[7]. Linear predictive coding (LPC) techniques were computed for sequential frames 64 points (6.4 m) apart. In each case, a 256- point Hamming window was used to select the data points to be analyzed [15]. Linear predictive coding module calculates 12 linear predictive cepstrum coefficients (LPCCs), with LPC order (p = 12), for each frame in the spoken utterance, thus 12 LPCC coefficients were extracted from each frame. When VQ is running, HMM part is non-existent , in this mode, the training set of LPCC vectors is used by the K-means clustering algorithm [14] to iteratively update the vector quantizer codebook until the average distance falls below a preset threshold. The desired codebook size is obtained by using LBG algorithm [8], [lo], [14]. This codebook is used by the vector quantizer in the testing mode. In the HMM training mode, a set of LPCC vectors (corresponding to an utterance of the Arabic word) is quantized by the vector quantizer to give a vector of codebook indices. A set of such vectors (corresponding to multiple utterances of the same word) is used to re-estimate the Hidden Markov Model for that word. This procedure is repeated for each word in the vocabulary. In the testing mode, the set of LPCC vectors corresponding to the unknown word is quantized by the vector quantizer to give a vector of codebook indices. This is scored on each word HMM to give a probability score for each word model. The decision rule is used to choose the word whose model gives the highest probability. Copyrig1110 2007 Praise Wor11r.vPrize S.r.1. - All riglirs reserve0

Speech signal

I

1

Filtered signal

Ct'indo~vingand Blocking

1

Sequence of frames Time .-llignnlent

Y-l

Feat111.eExtraction

1

LPCC coeff.. of frames

Classified Word Fig. 2. System block diagram

IV. I .

Database

An in-house database was created from 4 Arabic words. A number of 23 individual male Arabic native speakers were asked to utter all words 10 times. Hence, the database consists of 10 repetitions of every word produced by each speaker. Depending on this, the database consists of 920 tokens. All samples for a given speaker were recorded in one session. During the recording session, each utterance was played back to ensure that the entire digit was included in the recorded signal. All the 920 (4 wods X 10 repetitions X 23 speakers) tokens were used for training and testing phases depending on system run mode. We have in this research two modes, namely the multi-speaker mode and the Speaker-independent mode. IV. 2. Experimenral Reszrlts

Experiments were carried out on Arabic speech data base. The system is trained to recognition four words, which are: h lyaminl, JL..! IyasiRI, lamlmal , a IkhalaFI with the two distinct conditions from recognition: multi-speaker and speaker-independent modes. Each word is pronounced by different speakers (male and female). Trained and testing data contained lnrernational Review on Cotnprrfersand Sojhvare. Vol. 2. n. J

Z. S a k k a . J . K a c h o i t r i , M. Sarner

more than 1000 wav files each one represents one word. The recognition system used throughout this study is based on a speech representation by temporal and cepstral parameters and on the modelling of words by HMMs. A pre-emphasis as well as hamming window is applied. The function requires the following parameters: signal, sampling frequency, window type and number of coefficients. Table 1 shows some of the system parameters.

-

complication of the recognition task. Fig. 3 displays the error rate for individual Arabic words for both modes. TABLE ll CONFUSION MATRIXRELATING TO THERECOGSITIONSYSTEM OF ARABICWORDS USWG 12 LPCC COEFFICIENTS AND THREESTATES HMM MODEL (MULTI-SPEAKERS MODE)

vamin

khalaF

amima- --yaslR

yamin

0

6

194

0

Recognition Rate (%) 97.00

TABLE l SYSTEMPARAMETERS

Parameter Sampling rate

Dalabase Speakers Repetitions

Filter cut-off frequencies

Value 16 kHz. 16 bits

lsolaled 1 Arabic words 23 10 lOOHz and 48kHz

Pre-emphasized

1-0.952''

Window type and size LPC order

Hamming, 256 12

In the multi-speaker mode, the first and second repetitions of each word uttered by 16 speakers were used for the training phase. Thus, the total number of tokens considered for training was 192 ( 17 speakers X 3repetitions X 4 words).For the testing mode, we use all the 800 words in the recognition phase (20 speakers x 10 repetitions x 4 words). That implies that the whole of data of training is a subset of the whole of data of test. Table 2 shows tlie accuracy of the system for Arabic words individually. The overall system performance was 89.5%; this rate of recognition is relatively average. The recognition rate of word " ~4/yasiR/ ",is equal to loo%, on the other hand, the worst performance was encountered with word " lamima/". Even though the database size was small, the system showed an unexpectedly high accuracy considering the variability in pronunciation of Arabic words and the fact that we considered multi-speaker mode in contrast to speakerdependent (one speaker only trains and uses the system) mode. In addition, our time-alignment algorithm was very simple and straightforward. The protocol followed in the speaker-unknown mode is as follows: 10 repetitions of four speakers (speaker I + 3) are used for the training (the base of training is made of 120 words, 3 speakers X 10 repetitions X 4 words), the tests are made with the twenty remaining speakers (speaker 5+ 16). speaker-independent mode was used in configuring the system, it gave a lower accuracy rate as shown in Table 3. The total performance of the system was 73.13% for a system using a model HMM in five states. The worst performance was found in case of word " A IkhalafP (with accuracy equal to 48.5%); and the best performances were encountered in the case of word " JLI? /yasir/ " (with accuracy equal to loo%),. In general, for speaker-independent mode, this overall performance is acceptable if we keep in mind the C o w i g h t 6 2007 Praise Worthy Prize S.r.1. - AN rights reserved

TABLE Ill CONFUSION MATRIX RELATISGTO THERECOGNITIONSYSTEM OF ARABICWORDSUSING I2 LPCC COEFFICIENTS AND THREESTATES HMM MODEL (SPEAKERS INDEPENDENT MODE) khalaF

arnhna

vamin

3

amlma khalaF

0 19 97

26 0 119 3

as$^

yasiR 2 200 22 100

yamin

Recognitio~i Rate (A)

169 0 40 0

84.50 100.00 59.50 48.50

-mMulti-speeker mode --e- Speaker independenl mode

I

,,,,, W

'\,

\

Fig. 3. Error rate for individual Arabic words for both modes

V.

Subband Processing

This method is more resistant in rumours limited to a part of the frequency spectre. Indeed, in a classic systelii which extracts acoustic parameters on the totality of the frequency bands, all the coefficients will be disrupted even though noise occupies only a weak part of the bands. In a subbands system, only information stemming from the noise subband will be degraded those and stemming other subbands can be exploited for a better recognition [5], [7], [15], [Ib]. Fig. 4 represent the case of the uniform division in 8 under bands using filter-bank of two bands in arborescent structure. Ho(z) is tlie function of a low filter, whereas Hl(z) is the function of a high filter. Signals are then on-sampled and filtered respectively by a low filter Go(z) and a high filter Gl(z), then recombines to restore the signal. international Review on Computers and Sojrware, Vol. 2, n. 3

2.Sakkn, A. Kachouri, M. Samef

After division in two under bands; two bands containing each as many samples as the original signal. It should to say that there will be twice more samples after the decomposition. To remedy this problem, we introduce the notion of under-sampling which consists in preserving a sample on two to obtain the same number of departure samples after every decomposition. Inverse operation is on-sampling which consists in introducing zeros for the reconstruction of original signal.

Coefficients ao and a, are normalized. Magnitude response of low filter Ho and high filter HI for ao = 0.1576 and al = 0.6 148 is represented in Fig. 5.

Fig. 4. Subands decomposition

There is a great choice in the filter-bank construction for the division in subbands [17], [I 81. High filters and low filters master key of an 1.I.R filter-bank in two bands spell [17], [I 81: H~ (z) =

1 -[& (z') + z-1.4~ 2

(i')]

Fig. 5. Magnitude responses of the H,,and H I filter-bank

To estimate the importance of speech bands, we calculated energy relative to every band; the energy of the signal is an indication which can for example contribute to the detection of the speech voicing segment. Total energy Eo is directly calculated in the temporal domain on a trame of signal {s,) 0 < n < N-1 as:

where ..Io (z) and A, ( z ) are pass all filters. Frequency response of these filters are obtained by posing z = e'"' , where IV represents pulsation; ~v E [-a,~] . By replacing z by eJ" in equations ( 1 and 2); we notice that:

That will allow classifying them in order of importance in the objective of exploitation in the recognition task. The major part of the useful information is present in bands 1, 2 and 4. These bands will be exploited in our recognition system. Fig. 6 shows the energy of each subband.

Since Ho ( z ) is a filter passes of busy band Low; H I (z) will be Ho (:)Is comparison with the axis

symmetric

( z )

in

i7

IV

= - . Such filter-bank is

2 called Quadrature Mirror Filter (QMF) [18]. Pass all filters A,(=) and A/(=) may be written as follows:

.ao+ z-'

,4, ( z ) = I + aoz-I

Copyright O 2007 Praise Worthy Prize S.r.1. - A l l rights reserved

Fig. 6 . Energy of each subband

hlternational Review on Co~nprrtersand Sofivare. Vol. 2, n. 3

Z. S a k k a . A. A'achouri, M. Saruel

VI.

Arabic Subband Recognition System Evaluation

To increase the robustness of the recognition system (Fig. 2), one applies systems working in parallel to the signal part, their scores respective being then recombined to make a final decision. The study concentrated on multi-speaker (i.e., the same set of speakers were used in both the training and testing phases). We apply the parallel system based on HMMs recognition presented at Fig. 1 uses the decomposition of Fig. 4. With an aim of considering the importance relative to each subband of frequency to try recognition of word, we carried out tests of recognition on the 8 subbands. All the results obtained with the various combinatiorls [Multi-Speakers mode, 8 subbands, parameter LPCC, a HMM of 3 states] are presented in the confusio~imatrix as show Tables 4, 5, 6, 7, 8, 9, 10 and 11.

- If the good speaker is recognized (it should be said

-

classified first) on at least one of 8 subbands strained, the test of potentiality is considered as made a success. If the good speaker is not recognized on any subbands, the test of potentiality is considered as failure. TABLE VII CONFUSION MATRIXRELATINGTO THEI~ECOGNITION SYSTEM OF ARABICWORDSUSING 12 LPCC COEFFICIENTS ANDTHREE SrATES H M M MODEL (MULTI-SPEAKERS MODE): BAND4 Recogriirron khalaF ami~na yasiR yamin Rare (7%) yamin 26 28 I5 131 65.5 yastiR 41 133 26 66.5 amtima 37 89 57 17 44.5 khalaF 136 3 59 69 Torn1 61.37

TABLE VlIl

CONFUSION MATRIXRELATING TOTHERECOGNITIONSYSTEM OF 12 LPCC ~ O E F F I C I E ~ TA SN D THREESTATES H M M MODEL(MULTI-SPEAKERS MODE):BAND5

ARABIC WORDSUSMG

TABLE l V CONFUSION MATRIXREI.ATING TO THERECOGNITIONSYSTEMOF ARABICWORDS1ISlNG I2 LPCC COEFFICIENTS A N D THREESTATES I4MM MODEI.(MULTI-SPEA~ERS MODE). BANDI Recogrirr~orr klialaF a~nilna vasiR yamin Rore (%) vamin 4 5 191 95.5

TABLE V CONFUS~O~Y MATRIXRE1.4TIhG TOTHERECOGNIT~ONSYSTEAIOF ARABIC WORDS USl%Ci I? LPCC COEIIFICIESTS A N D THREESTATES H M M MODEL (MULI-I-SPEALEKS MODE) BAND2 Reco~~i~t~ori LlialnF a1n61na yasiR !a~nili Rare /%) vamin 31 169 845 yasiR I1 10 179 89.5 aminia 28 98 51 23 49 khalaF 168 II 20 I 84 Torn1 76.75 TABLE VI CONFUSION MATRIXRELATIKGTO THERECOGNITIONSYSTEM OF ARABIC WORDSUSlhiG I2 LPCC COEFF~C~ESTS AND THREESTATES H M M MODEL( ~ ~ u L . T I - S P E A ~ MODE) ~ E R S BAND3 Recogriirior? klialal-' a~nilna yasiR yamin Rare p'?)

..-.-

yatnin

53

29

45

73

, . -,

36.5

TABLE IX CONFUSION MATRIXRELATING TO THERECOGNITIONSYSTEM Or ARABICWORDSUSING I2 LPCC COEFFICIENTS AND THREESTATES H M M MODEL (MULTI-SPEAKERS MODE). BAND6 Recog~iirrort khalaF a~ni~na yasiR yarnin Rare 1%) yamin 17 38 53 62 31

TABLE X CONFUSION MATRIXRELATINGTO THERECOGNITIO~' SYSTEM OF ARABIC WORDSUSING 12 LPCC COEFFICIENTS AND THREESTATES H M M MODEL (MULTI-SPEAKERS MODE): BAND7

khalaF vamin

3

amPlna 8

yastiR yatnin 79

110

Recogrllrrori

Rare PA)

55

P

ya~nin

yastiR atntitna khalaF Torn1

23 43 22 137

I I 51 3

12 156 67 58

164

-

60 2

82 78 25.5 68.5 63.5

We used the strategy of following decision: in every test of a speaker, the best possible decision is considered by using the individual decisions of each recognizer's subbands. A test is counted then as potentially making success if at least one of the recognizers in subbands gives good answer, otherwise

TABLE XI

CONFUSION MATRIXRELATING TO THE &COGNITION SYSTEM OF ARABICWORDSUSING 12 LPCC COEFFICIENTS A N D THREESTATES HMM MODEL(MULTI-SPEAKERS MODE) BAND8

khalaF va~nin

22

amima 43

yasBR yamin 84

51

Recogt7rrio" Rare (A) 25.5

PI: Cop.vrighr O 2007 Praise Worrliy Prize S.r.1. - AN rights reserved

I~irernarionalReview or1 Coniprrrers and Sofnvare. Val. 2. t i . 3

2. Sakka, A.

Kachoitri,

The overall system accuracy is 95.25 %. The worst performance was found in case of word khalaF (with accuracy equal to 90.5 %); and the best performances were encountered in the case of word yaslR (with accuracy equal to loo%), word yamTn (with accuracy equal to 98%) and word Amlma (with accuracy equal to 92.5%). In general, for speaker-independent mode, this overall performance is acceptable if we keep in mind the complication of the recognition task. Fig. 7 display the error rate among Arabic recognition words in the wideband and subbands modes.

I

-.

* w~debandmode subbands mode

l

M. Samet

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9] [lo] [II]

[I21 Fig. 7. Error rate for Arabic recognition system

VII. Conclusion

[I31

An experimental isolated word recognition system for Arabic language was implemented. We presented the bases of the architecture subbands. We discussed then essentially the choice of a frequency domain division and the selection of the best subbands for the task of automatic speaker recognition. The results of identification obtained highlight !he advantage of the approach subbands. The results show that the speaker specific information is not equally distributed among subbands, in particular the low frequency subbands (under 500 Hz) and the high frequency subbands (over 2000 Hz) are more speaker specific than the middle frequency ones. However, approach described this above, is far from being optimal. Indeed, the subbands of frequency are supposed independents, that seems well little realistic. In perspectives, we suggest applying to the subbands system strained other methods of classification and to adopt an optimal rule of decision and investigate the benefits of subbands processing, as well as answering the question of the optimal number ofsubbands.

References [I]

[ I 41

W. Abdulah, M . Abdul-Karim, Real-time spoken arabic recognizer, International Journal o f Electronics 59 (5) (1984) 645-648. Y. A. Alotaibi, Investigating spoken Arabic digits in speech recognition setting, Information Sciences. Volume 173 (1-3): I 15- 139 (2005). H. Bourlard. S. Dupont. Subband-based speech recognition.ln Proc. IEEE Intl. ConC on Acoustics. Speech and Signal Processing, PP 1251-1254. Munich, Germany, April 1997. L. Besacier. J. F. Bonastre,.. Subband approach for automatic speaker recognition: optimal divis~ono f the frequency domain, I n Audio-and Video-based Biometric Person Authentication, Bigirn. el. Al: Eds., Springer LNCS 1206. 1997. L. Besacier. J. F. Bonastre. Frame Pruning for Speaker Recognition. Proc. IEEE lnternational Conference on Acoustics Speech and Signal Processing. 12- IS May 1998. Seattle (USA). K . Kirchhoff et al.. Novel approaches to Arabic speech recognition, linal report from the JHU summer workshop 2002, Tech. Rep.. John-Hopkins University. 2002. S. Tibrewala. H . Hermansky. Subband-based recognition o f noisy speech. I n PROC. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, pp 1255-1258. Munich. Germany. April 1 997. D . O'Shaughnessy. Speech communicat~on: Human and machine, IEEE Press. New York. second edtt~on.2000. S. J. Young, P.C Woodland. W. J. B! me. HTK Relirence Manual. for htk version 3.1, December 2001 M.A. Khawaja, N.G. Haider. Acoustic Analysis o f Phonetics o f Arabic Script Sindhi Language to evaluate Vowel-Consonant Segmentation. Journal o f Independent Studies and Research (J1SR)Volurne 2. Number 2. July 2004 2. Sakka. A. Kachouri. A. Benaissa. M. Samet. Automatic speech recognition using cepstral and itakura-salt0 distances for vocal command. Third International Conference on Systems, Signals & Devices. SSD'2OOS. 21-24 Mars 2005. Sousse Tunisie. N.G. Haider. Transforms I'or Speech Recognition, Journal o f Independent Studies and Research (J1SR)Volume 3, Number I, January 2005. 2. Sakka. A. Kachouri M. Samet. Speech recognition with hmm ~nodels for cochlear prostheses" 2004 IEEE International Conference on Industrial Technology. IEEE-ICIT'2004. 8-10 Ecembre 2004. Hammamet Tunisie. L. Sou~ci, M . sell am^. A hybrid neuro-symbolic approach for arabic handwritten word recognition. JACIII. Journal o f Advanced Computational Intelligence and Intelligent Informatics, Vol. 10, NOI.January 2006. T. Farah. L. Souici.. M. Sellami. Classifiers combination and syntax analysis for arabic literal amount recognition, Engineering Applications o f Artificial Intelligence, Volume 19, Issue I.Februaly 2006. J. H. Husoy and T. Gjerde. Computationally eflicient subband coding o f ECG signals Medical Engineering and Physics, M a n 1996. P.P. Vaidyanathan. Multirate Systems and Filter Banks. Englewood Cliffs. Prentice Hall. 1993.

.

-

[IS]

[I61

(171

(181

Authors' information 'Institut Supkrieur de Biotechnologie de Sfax ISBS. Universite de Sfax - B.P 261. 3038 Sfax Tunisia. 'Laboratoire d'Electronique et des Technologies de I'lnforrnation 'LETI', ENIS. Universite de Sfax - B.P.W. 3038 Sfax, Tunisia. Phone: (+216) 98 438 956 Fax: (+216) 74 175 595.

E. Hagos, Implementation o f an Isolated Word Recognition System, Master thesis. University o f Petroleum and Minerals. Dhahran, Saud~Arabia, 1985.

Cop,vrig/rr 0 2007 Praise Worth,v Prize S.r.1. - All rights reserved

lnrernational Review on Cotnputers and Sofnvare. Vol. 2, n. 3

270

2. Sakka. A. Kachouri,

Zied SAKKA was born in Sfax, Tunisia, in 1977. He received the electrical engineering degree in 2002 and Master degree on electronics and telecommunication in 2003, both from National School of Engineers of Sfax, Tunisia (ENIS). His research interests are in speech recognition and signal processing. .,.&' &=.. currently, he is Permanent Professor at ISBS Higher institute of Biotechnology of Sfax. E-mail: -vahoo.fr

4I

Abdennaeeur U C H O U R I was born in Sfax, Tunisia, in 1954. He received the engineering diploma from National school of Engineering of Sfax in 1981, a Master degree in Measurement and Insvumentation from National school of Bordeaux (ENSERB) of France in 1981, a Doctorate in Measurement and Instrumentation from ENSERB, in 1983. He wows on several cooperation with communication research groups in Tunisia and France. Currently, he is Permanent Professor at ENlS School of Engineering and member in the "LETI" Laboratory ENIS Sfax. E-mail: Abdennaceur.Kachouri~nis.mu.tn

M.Sar~~el

Mounir SAMET was born 111 Sfax, T u n ~ s ~Ina 1955 He obta~ned an Eng~neer~ngD~plo~na from Natlonal school of Eng~neerlngof Sfax In 1981. a Master degree In Measurement and lnstrumentat~on from Nauonal school of Bordeaux (ENSERB) of France In 1981, a Doctorate In Measurement and lnstrumentat~on from ENSERB. In 1981 and the Hab~lltatlonDegree (Post Doctorate degree) In 1998 He " ~ o r h s "on a several cooperation w ~ t hmed~cal research groups In T u n ~ s ~and France Currently, he IS Permanent Professor at ENlS School of Eng~neer~ng and member In the "LETI" Laboratory ENlS Sfa\ E-mall mounlr samet(u7enls mu tn

I I

-

Copyrigl~rO 2007 Praise Worrhy Prize S.r.1. All righrs reserved

Inrernarional Review on Computers and SoJnvare. Vol. 2, n. 3