SPEECH CONTROL FOR CAR USING THE TMS320C6701 DSP Department of Radio and Electronics, FEI STU, Ilkoviˇcova 3, 812 19 Bratislava Martin Petriska, Dušan Považanec, Peter Fuchs
[email protected],
[email protected],
[email protected] Abstract This paper presents a design of voice module designed for using in cars. This module communicate with driver by human speech. It informs driver about the state of car equipments and recognizes his voice commands. This feature makes it easy to control a lot of car equipment by human voice. The system can be used also for speaker verification to protect car against stealing. The system consist of DSP board with TMS320C6701, large memories and analog codec. Accordingly actual situation in speech recognition and speech synthesis with aspect for application in car technology will be described. Introduction Is it possible to use voice conversation with car, or is it only science fiction ? From the year 1998 is there Clarion AutoPC, the first in dash computer/personal assistant controlled through voice activation. Voice control does just about everything a push-button would do. Conversation with machine can be divided into two parts: speech synthesis and speech recognition. In speaking mode speech signal is generated according to central car computers commands, which are sent to the module. The commands can be as different as car position acquired from GPS module, name of telephone caller, telephone numbers, actual traffic informations, internet E-mails etc. In recognition mode the module analyses and executes the speaker commands e.g. to dial telephone number, to tune the car radio, to open the windows, to run air-conditioning and control others car peripherals. Commercial voice controlled systems for cars Since year 1998 is there Clarion AutoPC on the USA market. Drivers can tell the Clarion AutoPC to select tracks from its optional six CD changer, activate or de-activate the navigation unit radio, scan or select specific stations or adjust the volume. When e-mails arrived, the Clarion AutoPC automatically informs the driver the e-mail has arrived. Speech engine allows users to verbally browse through messages and hear who sent them, identify when they were sent and determine what the subject lines say. The same device sends drivers spoken traffic alerts concerning accidents and driving conditions. Speech technologies allow drivers to dial their phones via preset numbers simply by telling the unit to select a name in the address book. And, since it’s all controlled through voice activation, driver never have to take his handsoff the wheel. In the Europe is there Blaupunkt’s TravelPilot RNS149. Car audio and a terrific navigation system all in one. The GPS-based built-in navigation system gives you turn-by-turn guidance with natural-sounding voice output.
Figure 1: Functional diagram of TTS system Speech synthesis Speech synthesis in car voice module can be used for reading e-mails, SMS messages from mobile phone, news from internet webpages or other informations which are in text form available and can be simple filtered be interest. Text independent speech synthesis is in the most systems based on PSOLA methods. TD-PSOLA is currently one of the most popular concatenation methods. These methods are used for their simplicity, high voice quality and great naturalness. Therefor we decided to use these methods in your voice module. We have developed the Slovak speech synthesizers for personal computers and transform it into the DSP system. Block scheme of TTS system is shown on the figure 1. It consist from two main parts: Linguistic part and Digital Signal Processing part. Linguistic part consists from three parts: the Text pre-processing module, phonetic transcriptor and the prosody generator. Text pre-processing module process input text and converts numbers, short cuts, e.g. into text.The Phonetic transcriptor transform text into sequence of diphone codes. Diphones are segments of speech used for our concatenative speech synthesis. The Prosody generator generate from text three prosodic parameters: Ffundamental frequency (pitch), T- diphone length, l-loudness. The Prosodic features have specific functions in speech communication. They provide kinds of information e.g. relationships between word, finality or continuation, segmentation of the sentence into groups of syllables. Speech recognition Speech recognition is very complex problem. It requires using of many algorithms what make high computional requirements. To fulfill high performance new TMS320C6x DSP is chosen. Our project consists of two steps. In the first step we realize speaker verification. Speaker verification is the process of accepting or rejecting the identity claim of speaker. The algorithm for speaker verification will be based on DTW (dynamic time warping). The second step is command recognition, which need some additional memory to store comparing utterances. We plan to use speech synthesis with adaptating speech parameters according to recognized speakers voice for generating arbitrary utterance which will be compared whit speakers voice. This solution will save memory space, because the recognized words will be generating on the fly from
Figure 2: The block scheme of the DSP part the text. At this time we are testing miscellaneous methods for recognizing on the PC. There are many open source projects in the world, which can be used as the model for developing our speech recognizer, but the speech synthesis and speech recognition is language dependend. Therefore Slovak lexical, prosodic rules, Slovak diphones database for speech synthesis must be created as well Slovak speech database for independent speech recognition of Slovak language. At this time we have created two diphone databases, prosodic and lexical rules for Slovak TTS. TTS without prosodic rules was transformed into DSP and now work as speech value reader in power supply meter. Hardware - DSP part with TMS320C6701 The heart of speech processor is DSP board with processor TMS320C67xx. The device takes advantage of large on-board memory RAM and ROM memories where the sound parameters are stored. The main reason why to use most recently TI floating-point DSP’s in speech processing is their high performance. TMS320C6701 and TMS320C6711 are very powerfull and therefore are suitable for using in complex and high computational application e.g. speech analysis. At this time there were designed DSP board with processor TMS320C6701 which is used in POWER SUPPLY meter. It uses TTS system with PSOLA algorithm for reading the measured values. Summary. At this time there are several workplaces with develop in speech processing in Slovakia. The grade of develop is different. For the good speech recognition, large speech database is required. At the workplace in the Slovak Academy of Science was finished project Speechdat-E, fixed telephone speech database. Work on this project lasted approximately one year. For our project we need speech database recorded in car, but first experiments we can do with this database. Digital signal processors TMS320C7601 or TMS320C7611 enable to design a system with very high computional power and large memory space with minimal count of components what safes printed circuit board space and simplicifies design. These processors are very suitable for
Figure 3: Detail of DSP module with TMS320C6701 speech processing. The DSP board can be easy installed in car modules as voice interface to the central car computer. Work on the software for this module is in develop stage, because the speech processing, especially speech recognition is very complex problem and need lot of time. Speech synthesis for this module need develop in prosodic module too. This project is supported by project VTP 95/5195/297 with participate Slovak Academy of Science by project VEGA 47/0214/99
References [1] D. Považanec, M. Petriska, P. Fuchs, “Modern approaches to the speech synthesis based on DSP processors”, In., Proceedings of conference NOVTECH ’99 Žilina, November 24-26, 1999, pp. 41-46 [2] T.Dutoid, “An Introduction to Text-To-Speech Synthesis”, Kluwer Academic Publishers, Dordrecht Hardbound, ISBN 0-7923-4498-7 April 1997, 312 pp. [3] http://www.ti.com [4] http://www.autopc.com [5] http://www.lhs.com [6] http://www.speechdat.org