Parallel Image Compression Using Vector Quantization

6 downloads 202109 Views 205KB Size Report
apparent. With the proliferation of the Internet and ... speakers) is affordable and self paced. However .... provides a free hosting platform for web-voiced-based.
A Comparison between a Multisensory and Unisensory Training Approach in Accent Reduction Joaquin Vila, Ph.D.

Manwa L. Ng, Ph.D.

Applied Computer Science Illinois State University Normal, Illinois. 61790-5150

Speech Pathology & Audiology Illinois State University Normal, Illinois. 61790-4720

Abstract -- Due to the continuing influx of immigrants into the United States more people are speaking English with an accent. To ensure better communication among people, and more effective interaction between voice-based applications and their users, the need for an effective accent reduction strategy to improve people’s pronunciation skills is apparent. With the proliferation of the Internet and multimedia technologies, a number of multimedia language instructional applications have emerged. These applications provide students with a multisensory experience that facilitates the visualization of the process of speech. This research investigates the effectiveness of both multisensory and unisensory training approach in accent reduction. The outcome of this study shows that the multisensory (multimedia) approach is more effective in reducing accent than the unisensory (audio only) approach. This finding is consistent with some of the studies reported previously in the literature. Key Words: Reduction

Multisensory,

Unisensory,

Accent

INTRODUCTION Due to the continuing growth of multiculturalism and ethnic diversity in the United States, more people are speaking English with an accent. Thus, for the sake of better communication among people, and more effective interaction between voice-based applications and their users, there is an increasing need for an efficient and cost-effective accent reduction strategy to improve people’s pronunciation skills. Traditionally, accent reduction training is done in any of the following ways: 1) instructional materials reinforced with audiotapes (unisensory approach); 2) formal English as a Second Language (ESL) education; and 3) speech-language pathology sessions. Each approach presents advantages and disadvantages. Using instructional materials reinforced with audio-tapes (where users learn spoken English by imitating speech samples produced by native English speakers) is affordable and self paced. However, this

method fails to provide users with the appropriate feedback. In addition, this approach lacks objective assessment. With this instructional method, students are responsible for determining whether or not the sounds they produce are accurate. Another drawback of this approach is that most students attempt to produce English sounds based on the sound inventory of their mother tongue. With formal ESL education, students learn how to speak English in a traditional way by attending ESL classes. This approach of learning English is more structured and provides better feedback; however, it is relatively more expensive and to certain extent it lacks individuality. The other alternative for non-native English speakers to learn to speak English properly is with the help of a speech-language pathologist. This approach is usually the most effective and yields the best results as it provides students with personalized training. This highly specialized professional, the speech-language pathologist, is able to pinpoint and correct errors in articulation through customized therapeutic regimes. However, this approach is more expensive, and therefore, not prevalent. With the proliferation of the Internet and multimedia technologies, a number of multimedia language instructional applications have emerged. When using these applications for accent reduction, students can learn using a multisensory experience approach. These applications make use of a variety of multimedia elements (including audio and video) that enable students to visualize the process of speech. Some applications display a mid-sagittal view of the human face depicting the articulation of the vocal structures during the production of sounds (i.e., phonemes). The addition of visual feedback in language learning has shown to be more effective in a number of studies [1]. In a study using Visi-pitch to teach English suprasegmentals (including stress and intonation patterns) to foreign language students, Anderson-Hsieh found that the inclusion of visual information provides

the students with an accurate visual representation of suprasegmentals in real time paired with the normal auditory feedback that occurs during speech. Secules et al. [2] reported a significantly higher listening comprehension scores in French university students learning English using a multisensory feedback approach (i.e., audio and video) over those students using only a unisensory feedback approach (i.e., audio). Similar findings have also been reported by de Bot, and de Bot and Mailfert [1]. Findings from these studies indicate that a multisensory approach yields better results in ESL learning and accent reduction. However, contradictory findings have also been reported. Herron et al. [1] studied the effectiveness of multimedia-based instruction in reading and writing skills of French students. The outcome of that research indicated no significant difference between unisensory and multisensory approach. In another study by Duquette and Painchaud, similar findings were reported. In the above discussion, contradictory findings were reported on the effectiveness of a multisensory approach in language instruction. Furthermore, there seems to be a lack of research on the impact of a multisensory experience approach in accent reduction. Another issue of concern is the lack of objective assessment of the accent levels attained by the speakers. Therefore, there is a need to investigate the effectiveness of a multisensory approach in accent reduction training coupled with an objective and reliable way of evaluating accent levels of speakers. PURPOSE OF THE STUDY This research investigated the effectiveness of a multisensory approach in accent reduction using an unbiased and objective assessment tool. Specifically the present study attempted to answer the following research questions: 1) Is there a significant difference in the accent levels of participants using a unisensory approach over those using a multisensory approach? 2) Is there a significant improvement in the accent levels of participants after using a unisensory language training approach? 3) Is there a significant improvement in the accent levels of participants after using a multisensory language training approach? SIGNIFICANCE OF THE RESEARCH As more and more non-native speakers of English reside in the United States, more people are experiencing communication barriers with native speakers and difficulties interacting with emerging

voice-based applications. The main purpose of this research is to investigate the effectiveness of different training approaches for accent reduction. The outcome of this research will shed some light on the development of effective tools for accent reduction training. With more effective accent training tools, non-native speakers of English will be able to speak with higher intelligibility and to interact with voicebased interfaces more effectively. METHODOLOGY An experiment was designed to determine the effectiveness of a multisensory training approach for accent reduction. The experimental design used a pretest and posttest on a convenience sample drawn from (non-native speakers of English) students at a mid-western university to determine the accent levels before and after treatment. The subjects were randomly assigned to experimental and control groups. The experimental group was trained using a multisensory approach and the control group a unisensory approach. For every subject, demographics, accent levels before treatment (pretest), and accent levels after treatment (posttest) were recorded. Participants Forty individuals who were non-native speakers of English participated in the study. They were selected from the undergraduate and graduate classes who were able to understand and read English, but spoke English with an accent. All participants were divided into two groups: the control group and the experimental group. The control group received traditional unisensory (audio-only) training alongside with a brief description of the sounds, whereas the experimental group was provided with multisensory (audio and visual) training through the accent reduction tool (ART). Speech Materials In order to assess the accent levels of participants in this study before and after treatment, a list of 21 English words was used. The word list is currently used by speech-language pathologists in formal speech/ articulation/voice evaluation. The list contained simple English monosyllabic, bisyllabic, and trisyllabic words used in everyday conversation. RESEARCH TOOLS Accent assessment system To eliminate biases and inconsistency in assessing accent levels, an accent evaluation tool written in VoiceXML [3] was developed and used to objectively recognize participants’ production of the word list used in the study. VoiceXML is an open standard markup language for voice applications. While the traditional Hyper-Text Markup Language formats documents for the web browser and allows for traditional interaction paradigms, such as GUI or WIMP (Windows, Icons,

Mouse and Pull-down menus), VoiceXML formats a document for a voice browser with audio output, audio input, and keypad input. Audio input is handled by the voice browser's Automated Speech Recognition (ASR) engine. Audio output is either prerecorded speech or synthesized by using the text-to-speech engine of the voice browser. (See Figure 1.) The Accent Assessment System was hosted at BeVocal, Inc. [4]. BeVocal, Inc. provides a free hosting platform for web-voiced-based applications. The Accent Assessment System housed at BeVocal, Inc. can be accessed through telephone.

Phonetic Alphabet (IPA) symbols corresponding to the phonemes of the word; 3) a short explanation of how the phonemes should be produced; 4) the audio playback of the word; 5) the video playback of a frontal view of a native speaker producing the word; and 6) the computer graphic animation of the mid-sagittal view of the vocal structures during the production of the word. (See Figure 2.)

Figure 2. Accent Reduction Tool (ART). Figure 1. Architecture of Voice-Based Applications Using VoiceXML. The Accent Assessment System determines the accent levels based on the number of words produced by participants that are successfully recognized by the ASR engine. The more words that are successfully recognized by the ASR engine, the better the accent level. The three parameters (within a range of 0.0 – 1.0) that can affect speech recognition behavior of the Accent Assessment System are confidence level, sensitivity level, and speed/accuracy ratio. In order to maintain consistency during the experiments, these parameters were set at 0.7. Accent reduction tool (ART) To provide a multisensory experience to the experimental group, an Accent Reduction Tool (ART) was developed. Following sound principles of userinterface design, ART was implemented based on a two-tier client/server architecture. The back-end of ART has database connectivity to acoustic data, still images, video, computer graphics animation, and textual information of the list of words used in the study. The front end of ART displayed a combination of multimedia components including audio, video, computer graphic, and text to explicate the production of each of the 21 words used during training. During the multisensory language training session, the words were decomposed into a sequence of phonemes. For the participants to understand the production of the phonemes the following components were displayed: 1) the spelling of the entire word; 2) the International

Experiment Before starting the experiments, participants were given a brief introduction to the application and were apprised of the approved protocol. After the introduction, they were asked to complete a short demographic questionnaire. Prior to the accent reduction training sessions, the accent level of each participant was determined using the Accent Assessment System. The results of this step yielded the pre-test accent levels. The same test was administered to determine the accent level of each participant after treatment was completed. This yielded the post-test accent levels. The accent levels obtained in the pre-test and the post-test were statistically analyzed. Statistical analyses Independent sample t-tests, 2-proportion z-tests, and paired-sample t-tests were used to test for possible differences in the effectiveness of both the multisensory and unisensory training approach for accent reduction. Results and Discussion Results of paired-sample t-tests indicate a significant improvement in the accent level of the experimental group (t < 0.05) after treatment, but not the control group. This indicates that the unisensory training approach is not effective in reducing participants’ accent. However, with the addition of visual elements during training sessions, the participants have shown to reduce their accent levels significantly.

To compare the effectiveness between the unisensory approach and multisensory approach in accent reduction training, results from a two-proportion z-test show a significant difference in the accent levels attained after treatment. Once again, it appears that the multisensory (multimedia) approach is more effective in reducing accent than the unisensory (audio only) approach. This finding is consistent with some of the studies reported previously in the literature.

Conclusion Due to the continuing influx of immigrants into the United States more people are speaking English with an accent. To ensure better communication among people, and more effective interaction between voice-based applications and their users, the need for an effective accent reduction strategy to improve people’s pronunciation skills is apparent. The main purpose of this research was to investigate the effectiveness of different training approaches for accent reduction. The outcome of this research shows that the multisensory (multimedia) approach is more effective in reducing accent than the unisensory (audio only) approach. To develop more effective language-training materials, a multisensory approach that includes audio and a variety of visual cues should be used. With more effective accent training tools, nonnative speakers of English will be able to speak with higher intelligibility and to interact with voice-based interfaces more effectively.

REFERENCES

[1] Chun, D. M. (1998). Signal analysis software for teaching discourse intonation. LLT Journal, 2, 6177. [2] Brett, P. (2000). The Design and Evaluation of a Multimedia Application for Second Language Listening Comprehension. Unpublished doctoral dissertation. The University of Wolverhampton. [3] W3C (2003, April). Voice Extensible Markup Language (VoiceXML) Version 2.0. [WWW document]. http://www.w3.org/TR/voicexml20/ [4] BeVocal, Inc. (2003, April). BeVocal Café: Supercharge your Phone! [WWW document]. http://cafe.bevocal.com Acknowledgments The authors wish to acknowledge the contribution of Saritha R. Thotli to this research.