FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States
Human/Machine Interface Dialog Integrating New Information and Communication Technology for Pathological Voice Brahim-Fares Zaidi
Sid-Ahmed Selouani
Laboratory of Speech Communication and Signal Processing (LSCSP) University U.S.T.H.B Algiers, Algeria
[email protected]
Laboratory of Research in Human-System Interaction (LARHSI) University of Moncton, Shippagan Campus Moncton, Canada
[email protected]
Malika Boudraa
Ghania Hamdani
Laboratory of Speech Communication and Signal Processing (LSCSP) University U.S.T.H.B Algiers, Algeria
[email protected]
Laboratory of Speech Communication and Signal Processing (LSCSP) University U.S.T.H.B Algiers, Algeria
[email protected]
Abstract—The man-machine dialogue is too difficult to put in place but it remains an important issue for help people with speech problems. In this objective, we have oriented our work around a control interface including multiple communication tools: Videoconferencing between a doctor and patient thanks to a camera via Skype where a communication can be established and by which the doctor can make an initial diagnosis without the patient having to move; can access, for example, to a specialized site help to users for direct access giving them precise information, by listening, about the illness; ; An Automatic Recognition System of Continuous Pathological Speech (ARSCPS) was developed and adapted to the people with speech problems and a link has been established with our graphic interface based on the HMM (Hidden Models of Markov) [1] and the box tools HTK (Hidden Markov Model Toolkit) [1] where transcription recognized of sentences by this system will be executed in real time, the display will be done on a control screen independent of the system. The database Nemours [2] used in this application has been saved in a server where other databases can be integrated. This work has as foundation the VoiceXML standard who allows developing voice applications and website.
with doctors where will be used: Voice over Internet Protocol (VoIP), a network camera, an access to personalized website where will there be, for example, access to informations concerning pathologies of the voice.
Keywords—Automatic Recognition System of Continuous Pathological Speech (ARSCPS); Hidden Markov Model Toolkit (HTK); Hidden Models of Markov (HMM); Human/Machine (H/M); Information and Communication Technology (ICT); Voice
over Internet Protocol (VoIP) I.
INTRODUCTION
Order to allow access to services provided by the new technologies of information and communication for people with pathological voice. We are hitched to the development of an interface Human/Machine (H/M) their permitting a communication
We have integrated to this interface an Automatic Recognition System of Continuous Pathological Speech (ARSCPS) based on the Hidden Models of Markov (HMM) [1] that we have built. Pathological database NEMOURS [2] is saved and is accessible via this interface; Figures ''Fig. 1; Fig. 2; Fig. 3'' illustrating a few of those achievements. The paper is organized as follows: After an introduction, a first point where will be presented constituents modules of our Interface H/M dedicated to people with pathological voice, a second point where will be given steps of focusing of our ARSCPS that we have built and who is one of the main blocks of our H/M system; We will end with a conclusion. A. The various blocks constituting the interface H/M Among the blocks ''Fig. 1'' constituting our interface that allows to the users with pathological voice to communicate more effectively and order to improve their quality of life and offer them greater independence, we can find a communication via VoIP, a network camera, an access to the custom websites, an ARSCPS based on the HMM that we have built. For this last, a server for safeguarding of the database NEMOURS has been created as well as links that have enabled the exchange of text files and ''.wav'' files with our interface. This latest allows a direct access to a specialized website ''Fig. 2'' for obtain information, through listening, on illness of speakers by the creation of scripts HTML [3] and PASCAL [4].
1|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE
FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States One part oriented Web where we have exploited the functionalities of the VoiceXML platform [5] who we has served us basic for integrate the voice applications and the web in our dialogue system. VoiceXML platform ''Fig. 3'' is at the basis of an exchange of dialogue between a doctor and a patient suffering from voice disorder. This latest has been set up by creation the scripts (XML, HTML, PASCAL, and Hidden Markov Model
Toolkit (HTK)). Several programming languages (C++, JAVA, etc …) and several software (Delphi, Matlab, Cygwin, etc…) have been used. Each of blocks of the dialogue interface will allow, by a click, either to establish a call via Skype or mobile, a dialog via a network camera integrated to an ARSCPS or to a server containing a pathological voice database or may contain other databases, either from access to a site giving the general information on the pathology of the voice via the IP network.
Fig. 1. Diagram of the Human/Machine interface dialog
2|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE
FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States of Continuous Pathological Speech (NEMOURS); this thanks to Delphi software and the programming language PASCAL. This ARSCPS that we have put in place is based on the toolkit HTK [2]. The ARSCPS we has allowed of the one part the management of textual and sound files from any Database (NEMOURS or other), listening to the sound files of extension ''.wav'', the transcription of listened Files ''Fig. 4'', the passage between the ARSCPS we allows us to compare if a sentence of NEMOURS is recognized or not, display the recognition results obtained ''Fig. 5'', render the system independent of speakers and operating in real time ''Fig. 6''.we thereby allowing to conceive a seated of an automatic dictation system for people who suffer from speech disorders.
Fig. 2. Website of a System of Automatic Recognition of Continuous Pathological Speech
Fig. 3. The platform VoiceXML B.
The Automatic Recognition System of Continuous Pathological Speech (ARSCPS) This interface integrates an Automatic Recognition System
Fig. 4. Database NEMOURS and Speech Recognition
3|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE
FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States II.
CONCLUSION
First of all, the realization of this interface has necessitated a good exploitation of the VoiceXML platform, and the creation of several scripts at different levels (platform, ARSCPS, Web site). This led us therefore to: The realization of an ARSCPS ''Fig. 7'' that we have integrated with our interface; The creation of links with our NEMOURS Database; The management of textual and sound files; The access via the web to personalized sites;
The production of dialogues between the doctor and patient via VoIP using the Skype and also the mobile;
The dialogue via a network camera. Fig. 5. Result of Automatic Recognition System of Continuous Pathological Speech (ARSCPS)
Finally, among difficulties and problems that we have faced we can cite: The realization of an ARSCPS; The rate of the recognition of sentences of the Database (sentences pronounced by speakers suffering from speech disorder); The realization of the interface as well as the creation of links between the interface and our ARSCPS and the Database; The speech recognition of a speaker that does not exist in the Database; The creation of dialogue between the doctor and the sick person through the VoiceXML platform. Despite the results obtained at this stage of realization, they still remain of improvements to be made at the level of the intelligibility of sentences of the Database by the detection of deviance of zones or by filters, which will allow further improvements on the performance of our ARSCPS. Today, our challenges are the enhancement of this system in order to give hope to frail individuals who suffer from speech problem and their offer greater independence in their lives.
Fig. 6. Speech Recognition in real time
4|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE
FTC 2016 - Future Technologies Conference 2016 6-7 December 2016 | San Francisco, United States
Fig. 7. Automatic Recognition System of Continuous Pathological Speech (ARSCPS)
[1] [2]
[3]
REFERENCES S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, ‘’The HTK book’’, version 3.1, pp. 1-277, 2006. X. Menéndez-Pidal, J B. Polikoff, S M. Peters, J E. Leonzio, H T .Bunnell, ‘’The Nemours Database of Dysarthric Speech’’, J. IEEE, in press. M. Nebra, ‘’ Apprenez à créer votre site web avec HTML5 et CSS3Learn how to create your website with HTML5 and CSS3’’, pp. 1-248, June 2013.
[4]
[5]
Club des développeurs et IT pro- Club developers and IT pro. Les meilleurs cours et tutoriels sur la programmation et l'informatique professionnelle- The best courses and tutorials on programming and business computing (Cours et tutoriels sur la programmation DelphiCourses and tutorials on Delphi programming). http://delphi.developpez.com/cours/?page=langage [January, 20th, 2016]. Voxeo An Aspect Company, ’’XML DevelopmentLanguages Documentation’’, W3C, 685 Clyde Avenue Mountain View, CA 94043, version 2.1, pp. 1-254, January, 20th, 2015.
5|Page 978-1-5090-4171-8/16/$31.00 ©2016 IEEE