speech recognition with low-cost microcontrollers - IADAT

7 downloads 1290 Views 94KB Size Report
SPEECH RECOGNITION WITH LOW-COST ... The time domain analysis is limited to a unique Hamming window, been impossible in our case the use of a ...
SPEECH RECOGNITION WITH LOW-COST MICROCONTROLLERS Carlos Bernal-Ruiz, Francisco E. García-Tapias, Bonifacio Martín-del-Brío, and Antonio Bono-Nuez Abstract Speech recognition tools for human-machine interaction (HMI) in consumer equipments have been recently become a reality because of the improvement in pattern recognition technologies, signal processing, and the development of high performance microcontroller devices at low-cost. In this paper, a compact system for phonemes and small vocabularies recognition is presented, its orientation being consumer applications, where cost is of paramount importance. The idea is that the user could operate a home appliance (TV set, washing machine, etc.) by means of speech commands. Thus, the speech recognition method must be as simple as possible, in such a way that could be implemented (programmed) onto a standard microcontroller device, with the requirements of a typical low-mid range embedded application: about 1 kilobyte of RAM memory, limited computing resources (8/16 bit integer arithmetic), low clock frequency (MHz), portability among different microcontroller architectures, low resolution A/D converters and low sampling frequency. First, the speech signal is sampled at 6K samples per second. Then, the Linear-Cepstrum (LFC) is used for speech processing because of its relatively low computational cost, in comparison with those techniques used in computer based applications (or with powerful DSP processors), as Mel-Cepstrum (MFCC) or LPCC advanced analysis. The time domain analysis is limited to a unique Hamming window, been impossible in our case the use of a typical dynamic programming algorithm (as Dynamic Time Warping, DTW) because of the requirements exposed in the above paragraph (especially in relation to limited RAM memory available). Immediately after, a pattern recognition stage is carried out by means of a LVQ (Linear Vector Quantization) neural network, previously trained with a set of patterns from a limited vocabulary. The very simple pattern distance calculation allowed by the LVQ is especially interesting with the limited computing resources available. In addition, the neural network parameters are also adjusted to get an acceptable commitment among precision, noise immunity and implementation complexity. Finally, we estimate the performance of the whole speech recognition system developed by means of several parameters with validation groups in Spanish language. The prototype has been programmed onto a Mitsubishi M16C, a low-cost 16-bit microcontroller (about 5 euros), trained on a real environment (20dB of signal to noise ratio). Nevertheless, it can easily adaptable to microcontrollers of other manufacturers because the speech recognition system has been developed in C language.

References [1] D.Wang, J.Liu, Rensheng Liu, Liang Zhang. Embedded speech recognition system on 8-bit MCU core. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), 2004. [2] YY.Shi, J.Liu, RS.Liu. Single-chip speech recognition system based on 8051 microcontroller core. IEEE Transactions on Consumer Electronics. Feb. 2001. [3] R.Duda, H.Short. Patern Recognition, 2ed. Wiley, 2002 [4] Mitsubishi-Renesas microcontroller support. http://www.renesas.com/

Suggest Documents