A Speech Recognition System Based on Hybrid ...

5 downloads 11564 Views 415KB Size Report
mode employing a FDSS to compute similarity degrees between test and .... Computing similarity degree of two data samples was performed in [16, 17] by ...
A Speech Recognition System Based on Hybrid Wavelet Network Including a Fuzzy Decision Support System Olfa Jemai, Ridha Ejbali, Mourad Zaied, and Chokri Ben Amar REGIM-Lab.: REsearch Groups in Intelligent Machines, University of Sfax, ENIS, BP 1173, Sfax, 3038, Tunisia [email protected], [email protected], [email protected], [email protected] ABSTRACT

om

ce.

e an rs

i o c n

This paper aims at developing a novel approach for speech recognition based on wavelet network learnt by fast wavelet transform (FWN) including a fuzzy decision support system (FDSS). Our contributions reside in, first, proposing a novel learning algorithm for speech recognition based on the fast wavelet transform (FWT) which has many advantages compared to other algorithms and in which major problems of the previous works to compute connection weights were solved. They were determined by a direct solution which requires computing matrix inversion, which may be intensive. However, the new algorithm was realized by the iterative application of FWT to compute connection weights. Second, proposing a new classification way for this speech recognition system. It operated a human reasoning mode employing a FDSS to compute similarity degrees between test and training signals. Extensive empirical experiments were conducted to compare the proposed approach with other approaches. Obtained results show that the new speech recognition system has a better performance than previously established ones.

a ww l V

K e yw o r d s : Speech recognition, Fast training algorithm, Fuzzy decision support system.

1. INTRODUCTION

.nu

T r w i

Speech is one of the most natural ways human beings exchange information. In recent years, extensive research has been done to develop machines that can understand and produce speech as humans do so naturally. Therefore, computer scientists have been researching ways and means to enable people to talk to a computer, by making it recognize what they say. The list of applications of automatic speech recognition is increasingly long. Literature has indicated wavelets and wavelet networks as a promising approach for many applications. They were both used in [1, 2] in order to recognize driver eyes states to inhibit the hypo-vigilance. In [3] and [4], hand gestures recognition was established using the FWN in order to command computers. In [5], wavelets network was used to approximate the acoustic units for the task of the speech recognition. The authors in [6, 7] proposed a new method of 2D and 3D facial recognition based on a compact and representative biometric signature produced by means of wavelet networks. Also, in [8], it was used for images copies detection and in [9] for content based image retrieval and for image compression in [10]. Therefore, wavelet networks have widely proved their effectiveness in the classification domain [11, 12]. Furthermore, a recent paper in this direction [12, 13] has developed an efficient algorithm which effectively classifies different datasets. The main challenge for us is to propose a novel learning algorithm based on the fast wavelet transform (FWT) for speech recognition and include a fuzzy decision support system (FDSS) to compute similarity degrees between the query and the training signals in order to ensure the decision-making phase. The remainder of the paper is organized as follows: Section 2 gives a review of the wavelet network architecture and its learning algorithm. Section 3 outlines the proposed approach for speech recognition. Section 4 presents the experimental results with the aim of illustrating the effectiveness of the proposed method. In section 5, we give the conclusion and open perspectives for future works.

2 . TECHNICAL BACKGROUND This section presents a FWN classifier model that is able to approximate every sample (D) to produce a data signature.

Seventh International Conference on Machine Vision (ICMV 2014), edited by Antanas Verikas, Branislav Vuksanovic, Petia Radeva, Jianhong Zhou, Proc. of SPIE Vol. 9445, 944503 © 2015 SPIE · CCC code: 0277-786X/15/$18 · doi: 10.1117/12.2180554 Proc. of SPIE Vol. 9445 944503-1

Wavelet network in our algorithm The wavelet network which is discussed in our algorithm is described here. The wavelet network can be considered to be composed of three layers [14] and has the following presentation:

i o c n

Figure 1. Wavelet network architecture



n

D    i gi 

Learning algorithm



ce.

i 1

om



e an rs

Hidden neuron activation functions in this network architecture were some dilated and translated versions of both scaling and wavelet functions ( gi ) . Thanks to the fundamental property of parcimonious approximation of the wavelet networks, they are well suited to model any continuous function. Hence, the wavelet network was defined by pondering a set of these functions with weight values ( i ) to approximate a given data sample D.

T r w i

scaling function of the scale k.

.nu

a ww l V

It shows how we can learn a wavelet network using only the fast wavelet transform (FWT) technique to compute connection weights ( i ) which were determined by the iterative application of FWT. Before proceeding by the weights computation, we prepared a candidate wavelets and scaling functions library used as activation functions of the wavelet network [14]. To compute the weights corresponding to these activation functions, we analyzed k (k is the number of the scales) times the data sample D with the dual filters (h, g ) [15] of the dual scaling and wavelet functions (, ) . The result was a set of coefficients di corresponding to the wavelet weights of the scales i(i 1.. k ) and the weights (ak ) of the

3. OVERVIEW OF THE PROPOSED APPROACH TO SPEECH RECOGNITION

The solution which we presented to recognize speech can be split into two main stages. The first stage presents the learning phase which ensures the approximation of training speech signals samples by the FWN. The second stage is the recognition phase that includes a FDSS in order to compute similarity degrees between the query signals and the learning ones.

Learning phase As mentioned above, we proceeded by the approximation of every signal of the learning database by a FWN to produce a data signature. This latter was composed of wavelet and scaling functions ( gi ) and their corresponding coefficients ( i ) . Obtained information was used to match a test signal with all signals in the training database (Figure 2). The learning database was composed of speech signals classes. Each class consisted of samples Yi , (i  m) with m is the number of classes. The FWN was optimized for each example Yi and stored in a database containing all network models. Obtained network models were used in the classification phase explained in section 3.2.

Proc. of SPIE Vol. 9445 944503-2

om

i o c n

e an rs

ce.

Recognition phase

a ww l V

Figure 2. Training phase

.nu

In the recognition stage, the signal to be recognized was projected on all wavelet networks of the learning signals. After each projection, the family of activation functions remained unchanged (that of the learning signals), while new weights ( i ) , were computed. Coefficient vectors ( i and  i ) were used to compute similarity degrees between the query signal and the learning ones. This step was ensured using fuzzy measures.

T r w i

3.2.1 Computing Similarity degrees Computing similarity degree of two data samples was performed in [16, 17] by computing Euclidian distances between the two coefficient vectors ( i and  i ) . Activation functions used to produce coefficient vectors were of two types (scaling and wavelet functions). So, provided coefficients were necessarily of two types: the approximation coefficients (a k ) produced by scaling functions and detail ones  d k  resulting from wavelet functions use (with k is the number of coefficients in each coefficient vector). The two types of coefficients were generally not in the same values range, which may influence the measurement of distances. From this point we decided decorticating each vector into two, containing the approximation and the detail coefficients separately. Then, to recognize a speech signal, we computed distances between vectors containing the same type of coefficients. Resulting distances presented the entry of a FDSS, as illustrated in figure 3, that will compute similarity degree between the two speech signals in order to ensure the decision-making phase (deciding to which class belonged a speech signal).

Proc. of SPIE Vol. 9445 944503-3

om

i o c n

e an rs

Figure 3. Computing similarity degree between two signals

dsai 

 j 1 (VaT ( j )  VaL ( j ))2 

dsd i 

 j 1 (VdT ( j )  VdL ( j ))2  k

.nu

 With:

k

a ww l V



ce.

Distances between a query signal coefficients and those of the training signals (dsa i and dsd i ) were computed according to the expressions below:

T r w i

k = Number of approximation coefficients and detail coefficients. dsa i = Distance between approximation coefficients vectors of the query sample and a learning sample i. dsd i = Distance between detail coefficients vectors of the query sample and a learning sample i. V a T = Approximation coefficients vector of the query sample. V a L = Approximation coefficients vector of a learning sample. V d T = Detail coefficients vector of the query sample. V d L = Detail coefficients vector of a sample. Once all similarity degrees between the sample to be classified and the training ones were computed, our classifier would be able to decide to which class belongs the query sample. The training class having the largest similarity degree with the query sample was considered as the result of the classification phase.

4. GENERAL STRUCTURE OF FUZZY LOGIC SYSTEM. Fuzzy logic system consists of the following modules:  Fuzzification: The fuzzification is traditionally responsible for receiving crisp numeric measurements from the environment as input, processing them and mapping them into linguistic concepts by using suitable membership functions. The trapezoidal and triangular membership functions are generally chosen. In this work, the triangular shape was retained [14].

Proc. of SPIE Vol. 9445 944503-4

 Inference Engine and Rule Base: The fuzzy engine is responsible for processing all calculated membership function values using fuzzy set calculations and communicates with fuzzy rule base to identify the most suitable fuzzy output. Several methods of inference such as: MAX-MIN, MAX-PROD and SUM-PROD exist. In this work, the MAX-MIN method was used, because it is simple to implement.  Defuzzification: In order to be used in the real world, the fuzzy output of the fuzzy inference needs to be interfaced to the crisp domain by the defuzzifier by using suitable membership functions. The center of gravity method [18] was used in this work to determine the global degree of similarity.

5. EXPERIMENTAL RESULTS

om

ce.

.nu

T r w i

a ww l V

e an rs

i o c n

To highlight the robustness of the FWN-FDSS, and for reasons of comparison, experiments were performed using the same corpus, segmented from the work of Boudraa[19] and using the same type and number of MFCC coefficients describing units. So, we evaluated the recognition rates using the same evaluation protocol. Thus, we used 24 Arabic words classes. The obtained average recognition rate (RR) applying the FWN-FDSS on this corpus was about 98,33%, while the global RR obtained in [5] were 91,7% and 92,66% using HMM and WN models, respectively. Figure 4 illustrates the recognition rate of each Arabic word using the three speech recognition systems.

Figure 4. Recognition rates of each Arabic word

According to figure 4, the speech recognition system based on fast wavelet networks including a FDSS gives in most cases better recognition rate than that given by the speech recognition system based on wavelet networks and hidden Markov models. Properties of Beta wavelets showed the effectiveness of these tools. The advantages of this new technique for speech recognition system based on Beta wavelet are: - Detailed analysis of signals through multiresolution study and access to all frequencies of a signal. - Sophisticated modeling of signals by wavelet network. Unrestricted adaptation of wavelets in hidden layers to model signals of the training corpus (non-oriented choice of translations and dilations in hidden layers) [15].

6. CONCLUSION AND FUTURE WORKS This paper is a contribution on our part in the domain of speech recognition by creating a new speech recognizer. The architecture of the new recognizer was based on an FWN and a FDSS. The novelty of this paper resides in creating a new architecture based on fast wavelet networks. Each acoustic unit was modeled by fast wavelets network refining the

Proc. of SPIE Vol. 9445 944503-5

exposure of its characteristics. Moreover, a new classification method for the FWN using fuzzy measures to compute similarities between query samples and training ones. Results of comparisons with a corpus of Arabic words in the literature have shown that our approach performs better than other ones. In the aim of improving this work, we intend to extend our work by testing it on other corpus in speech recognition field.

ACKNOWLEDGMENTS The authors would like to acknowledge the financial support of this work by grants from General Direction of Scientific Research (DGRST), Tunisia, under the ARUB program.

i o c n

REFERENCES

ce.

.nu

a ww l V

T r w i

om

I. Teyeb, O. Jemai, T. Bouchrika, and C. Ben Amar, Detecting Driver Drowsiness Using Eyes Recognition System Based on Wavelet Network. 5th International Conference on Web and Information Technologies, Tunisia, Hammamet, 245–254, May (2013). [2] O. Jemai, I. Teyeb, T. Bouchrika, and C. Ben Amar, A Novel Approach for Drowsy Driver Detection Using Eyes Recognition System Based on Wavelet Network. IJES: International Journal of Recent Contributions from Engineering, Science & IT. 1, 46–52(2013). [3] T. Bouchrika, M. Zaied, O. Jemai, and C. Ben Amar, Ordering computers by hand gestures recognition based on wavelet networks. Int. Conf. on Communications, Computing and Control Applications proceedings, Marseilles, France. DOI: 10.1109/CCCA.2012.6417911, 1–6, December, 06-08 (2012). [4] T.Bouchrika, M. Zaied, O. Jemai, and C. Ben Amar, Neural solutions to interact with computers by hand gesture recognition. MTAP: Multimedia Tools and Applications. Volume 72, Issue 3, Page 2949-2975 DOI: 10.1007/s11042-013-1557-y, (2014). [5] R. Ejbali, M. Zaied, and Ben Amar, Wavelet network for recognition system of Arabic word. International Journal of Speech Technology, Vol. 13(3), Springer-Verlag, New York, 163–174 (2010). [6] S. Said, B. Ben Amor, M. Zaied, C. Ben Amar, and M. Daoudi, Fast and efficient 3d face recognition using wavelet networks. International Conference on Image Processing -ICIP, Egypt, 4153–4156 (2010). [7] M. Zaied, C. Ben Amar, and A. M. Alimi, Beta Wavelet Networks for Face Recognition, Journal Of Decision Systems JDS. Lavoisier 2005 Edition. 14(1-2), 109–122 (2005). [8] A. EL Adel, M. Zaied, C. Ben Amar, Learning wavelet networks based on Multiresolution analysis: Application to images copy detection. International Conference on Communications, Computing and Control Applications (CCCA’2011), Hammamet-Tunisia, March 3-5 (2011). [9] B. Guedri, M. Zaied, C. Ben Amar, Indexing and images retrieval by content. IEEE 2011 International Conference on High Performance Computing & Simulation (HPCS’2011), Istanbul Turkey, 369–375, July 4-8 (2011). [10] C. Ben Amar, O. Jemai, Wavelet Networks Approach for Image Compression. ICGST: The International Congress for global Science and Technology, GVIP International Journal on Graphics, Vision and Image Processing, GVIP Special Issue on Image Compression, 15– 23 (2007). [11] O. Jemai, M. Zaied, C. Ben Amar, A. M. Alimi, Pyramidal Hybrid Approach :Wavelet Network with OLS Algorithm Based-Image Classification. IJWMIP :International Journal of Wavelets, Multiresolution and Information Processing. 9(1), 111–130 (2011). [12] O. Jemai, M. Zaied, C. Ben Amar, A. M. Alimi, Fast Learning algorithm of wavelet network based on Fast Wavelet Transform. Int. J. Pattern Recognition and Artificial Intelligence. 25(8), 1297–1319 (2011).

e an rs

[1]

Proc. of SPIE Vol. 9445 944503-6

om

ce.

.nu

T r w i

a ww l V

e an rs

i o c n

[13] O. Jemai, M. Zaied, C. Ben Amar, A. M. Alimi, FBWN: an architecture of Fast Beta Wavelet Networks for Image Classification. 2010 IEEE World Congress on Computational Intelligence, the 2010 International Joint Conference on Neural Networks, CCIB, Barcelona, Spain, 1953–1960, July, 1823 (2010). [14] A. M. Murshid, S. A. Loan, Architectural design of fuzzy inference processor using triangular-shaped membership function. Open Systems (ICOS), IEEE Conference. 16–20, September (2011). [15] R. Ejbali, M. Zaied, C. Ben Amar, Intelligent approach to train wavelet networks for Recognition System of Arabic Words. KDIR International Joint Conference on Knowledge Discovery and Information Retrieval, Valencia Spain, 25-28 October. 518–522 (2010) [16] M. Zaied, S. Said, O. Jemai, C. Ben Amar, A novel approach for face recognition based on fast learning algorithm and wavelet network theory. IJWMIP: International Journal of Wavelets, Multiresolution and Information Processing. 9(6), 923–945 (2011). [17] M. Zaied, O. Jemai, C. Ben Amar, Training of the Beta wavelet networks by the frames theory: Application to face recognition. the international Workshops on Image Processing Theory, Tools and Applications, Tunisia, 165–170, November (2008). [18] V. L. Werner, E. K. Etienne, Defuzzification: criteria and classification. Fuzzy Sets and Systems. 108, 159–178 (1999). [19] M. Boudraa, B. Boudraa, Twenty list of ten arabic Sentences for Assessment. ACUSTICA acta acoustica. 86(43.71), 870–882 November (1998).

Proc. of SPIE Vol. 9445 944503-7

Suggest Documents