Connectionist or neural network models have often been used to model
language and speech. As Dell et al. ... Tutorial on Neural Systems Modeling.
Sinauer ...
A Biologically Inspired Neural Network for Modeling Phrase-Final Lengthening Connectionist or neural network models have often been used to model language and speech. As Dell et al. [4] point out, though, standard connectionist models of speech production lack the ability to simulate time progression, which is as essential to speech rhythm. Recent work in the area of speech prosody has been concerned with the mechanisms involved in phrase-final lengthening, and specifically how phrase-final lengthening interacts with stress or prosodic prominence [3, 9]. This work proposes a central pattern generator (CPG) model, a type of biologically-plausible neural network model, for the interaction between phrase-final lengthening and stress. Several articles have suggested that a CPG may contribute to speech production, but work directly investigating the capability of such a system to generate speechlike behavior is lacking [5, 6, 8, 2, 7]. A production study was designed after Turk & Shattuck-Hufnagel [9] to investigate the interaction of stress and lengthening at the end of English phrases. Adult American English speakers were recorded reading aloud sentences containing three proper nouns which could be read as two phrases with the structure (X or Y) and Z or X or (Y and Z) in which the number of syllables and stress location were manipulated. These sentences were presented one per slide in a slideshow, with sentence in the set appearing twice: once with a set of parentheses around the first two target nouns, and once with a set of parentheses around the second two, indicating how the phrases were to be disambiguated. Syllable duration measurements are taken for each of the syllables in the target nouns. Preliminary results support previous findings that phrase-final lengthening in English affects stressed syllables near phrase boundaries and phrase-final syllables while leaving unstressed syllables between the two unaffected [9]. Domain-based models of prosodic lengthening have so far been unable to provide a unified account of this phenomenon [9, 3]. An artificial neural network is proposed which comprises three half-center oscillators which serve as the inputs to three thresholded integrate-and-fire artificial neurons. One cell of each halfcenter provides excitatory input to its connected thresholded node, and one provides inhibitory input to the same. The resulting level of activation in each of the thresholded nodes is modeled as a sinusoid. The three half-centers oscillate at three different rates, so that of the three halfcenter-thresholded node pairs, there is a small-, medium-, and large-period set. Together these represent three metrical levels of phonological organization, such as syllable, stress foot, and phrase; or syllable, intermediate phrase, intonational phrase. All three thresholded nodes are connected to each other via excitatory and inhibitory connections. This work contributes to our understanding of how prosodic structures are generated and organized by demonstrating the ability of a biologically plausible neural network model to simulate the complex interaction of stress- and boundary-related syllable lengthening effects. Additionally, this study introduces a neurally-inspired model of speech prosody which is designed to be generalizable and flexible enough to be used in the simulation of a variety of speech rhythm and prosody behaviors. 498 words
[1] Anastasio, T.J. (2010). Tutorial on Neural Systems Modeling. Sinauer Associates.
[2] Barlow, S.M & Estep, M. (2006). Central pattern generation and the motor infrastructure for suck, respiration, and speech. Journal of Communication Disorders. 39;5. 366-380.
[3] Byrd, D. & Riggs, D. (2008). Locality interactions with prominence in determining the scope of phrasal lengthening. Journal of the International Phonetic Association. 38. 187-202 .
[4] Dell, G.; Chang, F. & Griffin, Z.M. (1999). Connectionist models of language production: lexical access and grammatical encoding. Cognitive Science. 23;4. 517-542.
[5] von Euler, C. (1983). On the central pattern generator for the basic breathing rhythmicity. Journal of Applied Physiology 55; 6. 1647-1659.
[6] Gracco, V.L. & Abbs, J.H. (1988). Central patterning of speech movements. Experimental Brain Research. 71; 3. 515-526.
[7] Lund, J.P, & Kolta, A. (2006). Brainstem circuits that control mastication: Do they have anything to say during speech?. Journal of Communication Disorders. 39; 5. 381-390. [8]Smith, A. (1992). The control of orofacial movements in speech. Critical Reviews in Oral Biology & Medicine. 3;3. 233-267.
[9] Turk, A. & Shattuck-Hufnagel. (2007). Multiple targets of phrase-final lengthening in American English. Journal of Phonetics. 35; 4. 445-472.