Document not found! Please try again

visiBabble Demo

5 downloads 343 Views 258KB Size Report
The visiBabble program can be run in an ... The program identifies sequences of landmarks, e.g., +g-g ... Institutes of Health STTR grant R42 DC005534.
visiBabble Demo Harriet Fell College of Computer Science Northeastern University Boston, MA USA +1 617 373 2198 [email protected] ABSTRACT

The visiBabble system responds with animations to an infant’s syllable-like productions and records the acoustic-phonetic analysis. The system reinforces production of syllabic utterances associated with later language and cognitive development. This demo will show off new animated responses and recent improvements in acoustic-phonetic feature detection. Keywords

pre-speech vocalizations, real-time analysis, feedback INTRODUCTION

Neurological and oral/motor impairments can significantly impact speech. A child may not be able to produce a sound when he or she wants to or may produce a limited range of sounds [1]. Infant vocalizations are effective predictors of later articulation and language abilities. Research studies emphasize the importance of early speech intervention for children at risk for being non-speaking. They also point out the difficulty of providing sufficient speech practice and feedback for children with such atypical speech patterns through traditional forms of intervention and interaction. THE VISIBABBLE SYSTEM

The visiBabble system [2, 3] includes a modern notebook computer, a microphone, a flat-panel display, and software. The visiBabble system responds to the child’s utterances with a variety of large, vibrant animations. A speech pathologist or parent can set the system to respond to various features in the child's vocalizations. For example: 1. syllable production 2. pitch variation 3. syllable or utterance complexity As visiBabble runs, it makes a digital recording of the session. It also saves a record of the times and types of landmarks and syllables it found during the session.

Joel MacAuslan Speech Technology and Applied Research Corp. Bedford, MA +1 781 861 STAR [email protected] Greater variety of landmarks indicates greater vocal complexity. The visiBabble program can be run in an ABA single-case study format [4] and data is collected during all phases to allow a comparison of behavior during the baseline and active phases. At the end of a session, the information recognized and collected, in real-time, by visiBabble is written to a tabdelimited file. Summary information is also written to the file, e.g. the number and average duration of each syllable type that occurred, the average number of syllables per utterance, and the number of syllables with each pitchcontour pattern. HOW IT WORKS A. Landmarks

The visiBabble software is based on the Stevens Landmark Theory [5]. Central to this theory are landmarks, points in an utterance around which listeners extract information about the underlying distinctive features. They mark acoustically abrupt events. Landmark processing begins by analyzing the signal in several broad frequency bands (See Figure 1). Because of the different vocal-tract dimensions, the appropriate frequencies for the bands are different for adults and infants; however, the procedure itself does not vary. First, an energy waveform is constructed in each of the bands. Then the rate of rise (or fall) of the energy is computed, and peaks in the rate are detected. These peaks therefore represent times of abrupt spectral change in the bands. Simultaneous peaks in several bands identify consonantal landmarks. B. Syllables and Utterances

The program identifies sequences of landmarks, e.g., +g-g or +s-g-b, as syllables based on the landmark order and inter-landmark timing. Among other constraints, syllables must contain a voiced segment of sufficient length. Figure 3 shows an example of this rule. An utterance is a sequence of syllables in which gaps between syllables are no more than (nominally) 200 milliseconds long. Both syllables and utterances may have properties of their own, such as a pitch template (rise/fall/rise) or a peak zero-crossing rate.

a

b

c

Figure 1: Initial spectral analysis of an infant utterance: voicing (bottom) and five frequency bands' energy waveforms. Landmarks are identified by large, abrupt energy increases or decreases that are simultaneous in several bands. (a) Too few bands show large, simultaneous changes in energy. (b) All bands show large, simultaneous energy increases immediately before the onset of voicing, identifying a +b (burst) landmark, to be followed immediately by +g (voicing). (c) All bands show large, simultaneous energy increases during ongoing voicing, identifying a +s (syllabic) landmark.

Figure 3: Ignored noise vs. recognized syllable. (Left segment) Noise marked by only +b and -b landmarks; (right segment) a faint babble marked by +g-s-g. Because any syllable must contain a voiced segment, the loud, noise segment is automatically ignored in subsequent processing. The babble, in contrast, has well defined voicing and sufficient duration and is hence retained. WHAT'S NEW

We presented results of field tests of the initial version of visiBabble at Assets 2004. Since then, we have extended the feature detection capabilities, the graphic response repertoire, and the reporting and analysis functionality of the system. We have also added sound feedback. Field-testing will start at Northeastern University and at the University of Nebraska, Lincoln in the fall of 2005. ACKNOWLEDGMENTS

The authors appreciate the encouragement and testing by Prof. Cynthia Cress, Univ. of Nebraska-Lincoln. We also thank Josh Ostrow and Jun Gong for their programming contributions. This work was funded in part by National Institutes of Health STTR grant R42 DC005534. REFERENCES Figure 2: Syllable Analysis of "seven" spoken by an adult female. In the waveform (top), V denotes the nominal vocalic center of each syllable. Notice that voicing persists without a complete oral closure between the syllables. The second syllable is identified by a landmark-based rule, i.e., a +s that is not closely preceded by a +g.

1. Cress, C.J., and Ball, L. Strategies for promoting vocal development in young children relying on AAC: Three case illustrations, Proceedings of RESNA '98 (Minneapolis, MN, June 1998). RESNA Press, 44-46 2. H.J. Fell, J. MacAuslan , C. J. Cress , L. J. Ferrier, "Using Early Vocalization Analysis for visual feedback," Proceedings of MAVEBA 2003, Florence, Italy. 3. H.J. Fell, J. MacAuslan, C. Cress, L.J. Ferrier, "visiBabble for Reinforcement of Early Vocalization," Proceedings of ASSETS 2004, Atlanta, GA., pp. 161168. 4. L.V. McReynolds, K.P. Kearns, Single Subject Experimental Designs in Communication Disorders, Baltimore: University Park Press, 1983.

Figure 4: New Graphics. The animations are larger an more varied than in the proof-of-concept version.

5. K.N. Stevens, S. Manuel, S. Shattuck-Hufnegel, and S. Liu, “Implementation of a model for lexical access based on features,” Proc. ICSLP (Int. Conf. on Speech & Language Processing), Banff, Alberta, 1, 499-502, 1992