javawiki : a mobile-based java programming ...

3 downloads 421 Views 262KB Size Report
The Java mobile-based dictionary will show and explain the ... engine is to process spoken input and translate it into text that an application understands. ... Innovative for teaching and learning methods used in developing human capital for ...
JAVAWIKI : A MOBILE-BASED JAVA PROGRAMMING LANGUAGE DICTIONARY USING SPEECH RECOGNITION Cyra Aleeza A. Ebuenga1, Hazel Joy V. Catacutan2,Christine Dianne Z. Castro3, and Professor Albert A. Vinluan4 1

College of Computer Studies, New Era University, Philippines [email protected] 2College of Computer Studies, New Era University, Philippines [email protected], 3 College of Computer Studies, New Era University, Philippines [email protected] 4 College of Computer Studies, New Era University, Philippines [email protected]

ABSTRACT This paper discussed the retrieval of data using speech recognition for finding the desired output requested by the user. Java related terms will be the specific data of the user. Studying all the words related to Java Programming Language can be stressful. There are too many words related in Java to be memorized by each of the students studying this programming language. It will be hard to understand and to memorize each words related to Java Programming Language. JavaWiki will be helpful in finding all the words related to the language and understanding each word in the simplest way. The Java mobile-based dictionary will show and explain the information for each input recognized through the users speech command. It will let us browse the different words related in Java Programming Language by simply using speech recognition. In able to understand the sound waves coming form the user, speech recognition will translate each sound waves into machine understandable language or simply it will translate spoken words into text. Speech recognition is also known as automatic speech recognition, computer speech recognition or just speech to text. JavaWiki is a mobile-based java programming language dictionary where input can be retrieve using the speech recognition technique. When the input is already retrieved it will automatically produced the desired output which is the meaning of the word that has been given by the user. It will help us understand more about Java Programming Language words.

KEYWORDS Mobile phone, Sound, Speech Signal, Speech Recognition, Innovation, Java Programming Language

INTRODUCTION The mobile phone is one of those influential advances in technology upon which other technology innovations are built. Mobile phone let us be associated wherever we are. When we are using are mobile phones we usually make sounds especially when you need to answer a call. Sound is a vibration that propagates as a typically audible mechanical wave of pressure and displacement, through a medium such asair or water. It is becoming an extension of ourselves since we can always have it with us. It signifies as a part of an explosion of communication options, and smart phones can support all of those options- voice, text, email, and more. It tends to be both a business and a personal device according to the study of Neustein, Amy (2010) [1]. Based on the study of Planenerer, B it says that while producing speech sounds, the air flow from the lungs first passes the glottis and then in throat and mouth. Depending on which speech sound articulate, the speech signal can be excite in three possible ways which are the voiced excitation, unvoiced excitation and transient excitation. In voiced excitation the glottis is closed and the air pressure forces the glottis to open and close periodically thus generating a periodic pulse train. In an unvoiced excitation the glottis is open and the air passes a narrow passage in the throat or mouth. And lastly, transient excitation is a closure in the throat or mouth will raise the air pressure by suddenly opening the closure the air pressure drops down immediately [2].

According to Kimberlee A. Kemble, project manager of Voice Systems Middleware Education in IBM Corporation that Speech Recognition allows to provide input to an application with your voice. It is performed by a software component known as the speech recognition engine. The primary function of a speech recognition engine is to process spoken input and translate it into text that an application understands. The application can do interpret the result of recognition as a command. In this case the application is a command and control application. An example of a command and control application is one in which the caller says ”check balance”, and the application returns the current balance of the caller` s account or can handles the recognized text simply as text, then it is considered a dictation application. In a dictation application, if you said “check balance” the application would not interpret the result, but simply return the check balance [3]. According to D.S. Malik, “programming is a process of problem solving and is a process of planning and creating a program. Learning a programming language is like learning to become a chef or learning to play a musical instrument. All three skills require direct interaction with the tools”. Fundamental knowledge of the language must have and testing a program on the computer to make sure that each program does what is supposed to do [4]. According to Pradnya Choudhari, Java has gained enormous popularity since it first approved. Its rapid ascension and wide acceptance can be traced to its design and programming features, particularly in its promise that you can write a program once, and run it anywhere. Java was chosen as the programming language for network computers and has been perceived as a universal front end for the enterprise database. As stated in Java language white paper by Sun Microsystems: “Java is a simple, object- oriented, distributed, interpreted, robust, secure, architecture neutral, portable, multithreaded, and dynamic.” [5] Innovative for teaching and learning methods used in developing human capital for “sending and receiving information to the students and teachers. Teacher tries to share the best of their knowledge on the methods or ways of how they understood it. Innovative educational methods have the ability to develop education, empower people, support governance and stimulate the effort to achieve the human development goal” according to Dr. Damodharan and Mr. Rengarajan[6]. One of the things that faces by the students in learning a java programming language terms and programs was on how, what, when, where to use the terms or keywords that used by a professor in teaching or for a machine problem solving. The proposed study committed on inputting a word into a voice or a speech that will interpret the result of recognition as a command. In this case the application is a command and control application. An example of a command and control application in a proposed study is one in which the user says “array”, and the application returns the meaning, sample program and the explanation of how, where, and when to use array in a program that helps the students to understand the java programming terms not just making a student learned but that adds motivation in thinking, knowledgeability and independence of a student and a teacher.

Related Literature Setswana Speech Recognizer for Computer Based Applications Author: Oratile Leteane and Francis, J. Ogwu University of Botswana According to Long, B., involved engineers and speech scientist has attracted to a spoken language interface to computers topic [6]. Speech recognizing system is a specific form of natural interaction between human and computers where users can speak and listen to an interface rather than typing or writing on the screen [7]. Automatic Speech Recognition (ASR) and Text To Speech (TTS) are the two major components of speech recognizing that form its support [8]. Speech productions take the factors from fully tagged phonetic sequence and generate the corresponding speech waveform [9]. During training and recognition, ASR is a resource and data severe process. For the better recognition, correctness or accuracy of the data, data that is used for teaching or training normally should come from different speakers. [10]. Recognized word is an output from speech processing part and may either be a final product or an input to speech recognizing system for further processing. It is observed that the recognition of continuous speech is affected by the rate of spoken language [11]. First, the acoustic realizations of phonemes, the smallest sound units of which words are composed, are highly dependent on the context in which they appear [9].

According to Chowdhury, S., SAPI abstracts the developer from the low level details of the SR engine, nonetheless, it is essential that the developer knows the potential, functionalities and work realized by the engine, in order to model and optimize the target application [12]. In the previous researches, most developed and operational voice recognizing applications are based on whole word matching [7]. Normal speech waveform may vary from time to time depending on the physical condition of speakers’ vocal cord [13].There are numerous decoders that can be used in the development of speech recognizers with different searching algorithms and the recent once use HMM. Some examples of these decoders include Hidden Markov model Toolkit (HTK) decoder found in [11], Sphinx decoder in [14] and one pass decoder in [15].

Figure 1. Interfacing computer application and speech recognizer The Setswana speech recognizer under development is adaptable to any Setswana based speech driven application for as long as the words needed to drive the application are within defined set of recognizable words in the database. The questions and answers are stored in a database and both systems connect to the same database to load questions and answers which are written in Setswana language. Figure 2 show the tables that store questions and answers used.

Figure 2. Tables storing questions and answers Creating speech data by recording voices of different speakers that was converted into Mel Frequency Cepstral Coefficients (MFCC) was carried out. The creation took into consideration that people may pronounce same words differently. In this research, one word was recorded from at least fifteen different speakers in order to accommodate different pronunciations. The speaker’s characteristics were based on their level of literacy, ages, gender, nationality, district they come from and language dialects. All these characteristics were recorded and identified by their name and also given a unique number for their easy differentiation. During recording sessions, each sound wave was recorded with the parameters below;

Figure 3. Parameters for each sound wave

In this work, 16 kHz sample rate was chosen because it provides more accurate high frequency information and 16 bit per sample divided the element position into 65536 possible values. Audacity was used for recording and editing of sound files. This information is provided to the trainer through a file called the transcript file. Transcript file is needed to represent what the speakers are saying in the audio file. A pronunciation dictionary is responsible for determining how a word is pronounced. It has all acoustic events and words in the transcripts mapped onto the acoustic units to be trained. Redundancy in the form of extra words is permitted.

Methodology Speech recognition is the ability of machines to respond to spoken commands and to decode the human voice into digitized speech so it can be understood by the computer. It allows “hands-free” control of various electronic devices by determining the textual representation of the speech that will serve as the input or task to be executed. Speech recognition technology (SRT) is a computer software system that converts the spoken word to text. It takes an audio stream as input and turns it into a text transcription. To convert human voice into on-screen text or machine understandable language, the computer must go through several steps. Vibrations in the air are made when someone speaks. The first step will be the system filters which let the sounds to be digitized to remove unwanted noise, and to separate it into different bands of frequency. Frequency is the wavelength of the sound waves, heard by humans as differences in pitch. People don't speak at the same speed every day, because of that the sound must be adjusted to match the speed of the sample sound template that is already stored in the system's memory. The next step focus on the speech recognition research. It seems simple but it is actually the most difficult task to accomplish. The system will examine phonemes in the context of the other phonemes around them. Contextual phoneme is run through a complex statistical model and compares them to a large library of known words, phrases and sentences. The program then will automatically determine what the user was probably saying or output it as text or a computer command. Application of this technology can also be seen in speech-to-text processing and simple data entry. Since speech can be used to control your computer by verbalizing text or commands it offers convenience to user such as those who are physically incapacitated. Learners who are blind can benefit from using this technology. This said technology can express words and then hear the computer narrate them, as well as use a computer by commanding with their voice, as an alternative of having to look at the screen and keyboard. This will also benefit others who are physically disabled or who suffer from tedious strain injury. They can be relieved from having uneasiness about typing, handwriting or working with school assignments by using speech-to-text programs. But also you must know that this research will only focus on Java Programming Language related terms. From the knowledge perspective, speech recognition has a long history with several waves of major innovations. Most recently, the field has benefited from advances in deep learning and big data. Speech Recognition is presented merely in English, French, Spanish, German, Japanese, Basic Chinese, and Traditional Chinese. This technology infers simply that the device can take dictation, not that it understands what is being said. Comprehending human languages falls under a different field of computer science called natural language. Speech recognition technique usually evaluated in terms of accuracy and speed. Speech Recognition also needs vocalization. Vocalizations vary in terms of accent, pronunciation, articulation, roughness, nasality, pitch, volume, and speed. Speech is the communication using the human voice. Speech is distorted by a background noise and echoes, electrical characteristics.

Conclusion and Recommendation This paper introduces a mobile-based application which known as “JavaWiki”. It uses speech recognition as a technique to translate the sound waves into machine understandable language. This study must produce the desired output which is the meaning of different java programming language terms that the user asked to. This is a first attempt for a mobile-based dictionary using speech recognition in the field of java programming language.

It lets the users to further understand the language easily and effectively. The deployment of speech interface in a dictionary application has been much focused on representing definitions aimed. This study must produce the meaning of the different java programming language terms that the user asked. The user then must only asked java programming language related term. The study will be able to meet the objectives of extracting information using speech recognition. Concerning scalability and better performance, there remain some issues that need more research and development, for instance, the development of databases and potential approach for user speech input recognition. Generally, the study offers to have a dictionary which only focuses on java programming language terms wherein the input will be coming from the user’s voice command. We therefore recommend that the system must continue to develop to fulfill the needs of the users. JavaWiki will be beneficial to each netizens because it will help them to understand more about the Java Programming Language without any hassle of typing or writing on the screen. The only thing that the user will do to search the desired meaning of the java term is to speak up the term the user want to know.

Acknowledgment This accomplished study could have been impossible without the people who extended their utmost benevolence in sharing their knowledge and time to us. To our family for their undying moral and financial support; To our friends and classmates for sharing their sources and for keeping our spirits high in times we needed it the most; To other teams who generously shared their ideas to us so we can successfully finish this case study; To Professor Albert A. Vinluan, who shared his own knowledge to us about the field and fundamentals of this; And most of all, we would like to thank Heavenly Father, for giving us strength, knowledge, and courage we need to get this study done.

REFERENCES [1]Planenerer, B 2005, An Introduction to Speech Recognition. pp. 3 Retrieved from [2]Kemble, Kimberlee A. An Introduction to Speech Recognition. pp.1. Retrieved from [3]Malik, D.S. (2012). Java Programming: From Problem Analysis to Program Design, 5th edition pp.41 [4]Choudhari, Pradnya. Java Advantages & Disadvantage. pp.1 . Retrieved from [5]Dr. Damodharan and MR. Rengarajan. Innovative Methods of teaching. pp.1.Retrieved from [6]Long, B 1994, Natural Language as an Interface Style, Dynamic Graphics Project, University of Toronto, Viewed 20 September 2011. [7] Preeze, J, Sarp, H and Rogers, Y 2007, ‘What is Interaction design’, Interaction design- beyond Human Computer Interaction, 2nd edition, John Wileys & Sons Ltd, England, pp 2-41. [8] Carnegie Mellon University, 2000, Sphinx Train Documentation, Viewed 13 March 2011. [9] Freitas, J 2007, ‘Spoken Language Interfaces for Mobile Devices’, M.Sc. Thesis, Instituto Superior de Engenharia de Lisbon, Portugal. [10]Wikipedia, 2011, Speech Recognition, Viewed 20 December 2010. [11]. Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., Woodland, P 2002,

‘HMM Definition Files’, The HTK Book, Version 3.1, Cambridge , PP 91-110. [12] Chowdhury, S 2010, ‘Implementation of Speech recognition for Bangla’, B.Sc. Thesis, Brac university, Bangladesh [13] Doe, H 1998, ‘Evaluating the Effects of Automatic Speech Recognition Word Accuracy’ M.Sc. Thesis, Virginia Polytechnic Institute and state university, Virginia. [14] Lamere, P, Kwok, P, Walker, W, Gouvea, E, Singh, R, Raj, B, Wolf, P 2003, ‘Design of the CMU Sphinx-4 Decoder’, 8th European Conference on Speech Communication and technology (EUROSPECH). [15] Pylkkonen, J, ‘An efficient one-pass decoder for Finnish large vocabulary continuous speech recognition’, In proceedings of 2 nd Baltic conference on human language technologies, 2005, pp 167-172.