Formant compensation for auditory feedback with English vowels Takashi Mitsuyaa) School of Communication Sciences and Disorders, National Centre for Audiology, Western University, Elborn College, Room 1207, London, Ontario N6G 1H1, Canada
Ewen N. MacDonald Department of Electrical Engineering, Technical University of Denmark, Ørsteds Plads, Building 352, Room 116, 2800 Kgs. Lyngby, Denmark
Kevin G. Munhall Department of Psychology, Queen’s University, Humphrey Hall, 62 Arch Street, Kingston, Ontario K7L 3N6, Canada
David W. Purcell School of Communication Sciences and Disorders, National Centre for Audiology, Western University, Elborn College, Room 2262D, London, Ontario N6G 1H1, Canada
(Received 27 November 2014; revised 12 June 2015; accepted 16 June 2015; published online 20 July 2015) Past studies have shown that speakers spontaneously adjust their speech acoustics in response to their auditory feedback perturbed in real time. In the case of formant perturbation, the majority of studies have examined speaker’s compensatory production using the English vowel /E/ as in the word “head.” Consistent behavioral observations have been reported, and there is lively discussion as to how the production system integrates auditory versus somatosensory feedback to control vowel production. However, different vowels have different oral sensation and proprioceptive information due to differences in the degree of lingual contact or jaw openness. This may in turn influence the ways in which speakers compensate for auditory feedback. The aim of the current study was to examine speakers’ compensatory behavior with six English monophthongs. Specifically, the current study tested to see if “closed vowels” would show less compensatory production than “open vowels” because closed vowels’ strong lingual sensation may richly specify production via somatosensory feedback. Results showed that, indeed, speakers exhibited less compensatory production with the closed vowels. Thus sensorimotor control of vowels is not fixed across all vowels; instead it exerts different influences across different vowels. C 2015 Acoustical Society of America. [http://dx.doi.org/10.1121/1.4923154] V [ZZ]
Pages: 413–424
I. INTRODUCTION
Speech production is a process through which the mental representation of speech sounds is transformed into the physical form of acoustics by generating and enacting articulatory plans. Although many processes take place between the mind and the physical world, we are readily able to produce speech sounds accurately. “Accuracy” here loosely means that we are able to reliably produce the sounds that we intend. Thus the intended sounds are considered to be the target of speech production. To study the nature of speech production targets, one might measure the outcome of acoustic and/or articulatory trajectories. However, because acoustic signals and articulatory movements are variable, it is not easy to define what parameters the production system is trying to control. Instead we can examine the ways in which a production error is corrected to give us an estimate of the production target. This can be done by investigating how
a)
Electronic mail:
[email protected]
J. Acoust. Soc. Am. 138 (1), July 2015
speakers monitor the accuracy of speech production using sensory feedback. The most obvious sensory feedback associated with speech production is auditory feedback. Clinical reports show the effects of not hearing our own voice, including more variable articulation and acoustics among postlingually deafened individuals (Waldstein, 1990; Cowie and Douglas-Cowie, 1992; Schenk et al., 2003) or even with a short-term lack of auditory feedback achieved by turning on and off cochlear implants (Cowie and Douglas-Cowie, 1983; Higgins et al., 2001). More recently, laboratory examinations using real-time auditory perturbation paradigms, which modify speech signals with minimal delay, have allowed researchers to investigate how speakers correct their acoustics and/or articulatory movements when auditory feedback indicates that the produced speech contains errors. This type of compensatory production has been reported with various speech parameters, such as amplitude (e.g., Bauer et al., 2006), fundamental frequency (F0 hereafter; e.g., Burnett et al., 1998; Jones and Munhall, 2000), vowel formant frequency (e.g., Houde and Jordan, 1998; Purcell and Munhall,
0001-4966/2015/138(1)/413/12/$30.00
C 2015 Acoustical Society of America V
413
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 129.100.253.75 On: Mon, 20 Jul 2015 14:31:47
2006b; Villacorta et al., 2007), and fricative noise (Shiller et al., 2009; Casserly, 2011). In vowel perturbation studies, the first formant (F1) is either increased or decreased in real time [along with the second formant (F2) in some studies], while speakers are producing a simple word. In response, speakers change the produced formant frequency in a direction opposite to the perturbation applied. For example, if F1 of the vowel /E/ is increased, the formant structure of the perturbed feedback resembles that of /æ/. In response, the produced F1 may be lower and comparable to that of /I/. This compensatory formant production has been reported both with (1) ongoing, moment-to-moment production control, i.e., feedback based motor correction (e.g., Purcell and Munhall 2006a; Tourville et al., 2008; Cai et al., 2010), and (2) adaptive learning of the perturbation for future production control, i.e., updating the feedforward state of the vowel motoric plans (e.g., Houde and Jordan, 1998; Purcell and Munhall, 2006b; Villacorta et al., 2007). Importantly, it has been shown that speakers are sensitive to this type of perturbation, compensating to relatively small manipulations (76 Hz perturbation to initiate compensation; Purcell and Munhall, 2006a) and with short response time (108 ms to respond; Tourville et al., 2008). Just as auditory feedback is essential for controlling acoustic/articulatory details of speech production, somatosensory feedback also plays an important role. Deprivation of sensation from virtually all of the vocal tract surface above the glottis resulted in different speech production (Putnam and Ringel, 1976; Scott and Ringel, 1971). For example, high vowels did not achieve the same degree of closure (Putnam and Ringel, 1976; Scott and Ringel, 1971), while low vowels were more retracted (Scott and Ringel, 1971). A phantom sensation of jaw opening, induced by vibrating the masseter tendon, made speakers open their jaw less for an open vowel /a/ (Loucks and De Nil, 2001). Perturbations applied to the trajectory of articulatory movements also elicit compensations. For example, when a bilabial closure was prevented by the lower lip being pulled downward, the lips moved more to achieve the intended closure (Abbs and Gracco, 1984; Gracco and Abbs, 1989), and the timing of laryngeal activity was also reorganized for voicing (Munhall et al., 1994). Similarly, when speakers’ jaw movements were perturbed mechanically while opening their mouth or producing a speech segment, not only the trajectory of jaw movement but also lip movements compensated (e.g., Folkins and Abbs, 1975; Folkins and Zinnermann, 1982; Shaiman, 1989). A recent study showed that when the jaw was pulled forward, compensatory jaw movement occurred only within the speech context (Tremblay et al., 2003). These findings indicate that somatosensory sensations are part of a rich and dynamic speech production representation. The importance of auditory and somatosensory feedback has been empirically established mostly separately, thus how the speech motor control system incorporates both feedback types is not fully understood. Perturbations applied singly to either auditory or somatosensory feedback may induce incompatible information in the speech control system. For example with auditory perturbation, auditory 414
J. Acoust. Soc. Am. 138 (1), July 2015
feedback indicates that the segment being produced is erroneous, while somatosensory feedback indicates that the execution of articulatory plans was on target. Under this circumstance, incongruence between multimodal feedback systems needs to be somehow resolved. This incongruence has been suggested to be a possible reason why compensatory formant production is almost always partial (see MacDonald et al., 2010 for a review of partial compensation). To disentangle the different contributions from auditory and somatosensory feedback, recent studies by Feng et al. (2011), and Lametti et al. (2012) perturbed speakers’ F1 and displaced their jaw position separately or concurrently. While Feng et al. (2011) found that auditory feedback was weighed more heavily than somatosensory feedback, Lametti et al. (2012) reported that some speakers relied more on somatosensory feedback. Lametti et al. (2012) argued that there is between-speaker variability in magnitude of compensation, and partial compensation is due to averaging a wide range of magnitude values across speakers. However, even among those who did not compensate for the jaw perturbation, few compensated fully for the formant perturbation. Thus it is still possible that partial compensation is due to opposing auditory and somatosensory information. While it is important to examine the detailed mechanisms of how and how much these feedback systems contribute to speech production control, investigations must consider that the contributions of different modalities may not be homogenous across all vowels. For example, high vowels like /i/ and /u/ have lingual contact with the teeth and/or the hard palate, while such sensation is not present for open vowels. Because lingual sensations are robustly represented in the sensory cortex (see Penfield and Boldrey, 1937; Picard and Oliver, 1983 for reviews), speakers are very sensitive to such inputs. Recall that the studies by Putnam and Ringle (1976) and Scott and Ringel (1971) showed that oral sensations were particularly important for monitoring tongue position and/or jaw closure of high vowels. In a recent study by Bouchard et al. (2013), cortical activity was measured directly from the surface of the ventral somatosensory cortex (vSMC) while the subjects were producing simple speech sounds. During production, spatial representation in vSMC specific to four articulators (i.e., lips, jaw, tongue, and larynx) was reported. While some areas of vSMC were shown to process many of the articulators, the areas uniquely dedicated for lips and tongue articulatory processes were larger than those for jaw and larynx activity. Although the spoken tokens examined in their study were not isolated vowels, their data strongly suggest that speech sounds that evoke more tongue engagement and lip postures are more strongly represented in the somatosensory area than jaw and larynx. For the actual motor engagement of speech production, Alfonso and Baer (1982) measured electromyographic data of the superior longitudinal and the posterior part of the genioglossus while various vowels were produced in a fixed phonetic context. Peak genioglossus activity was relatively large with diphthongs /e/ and /o/ due to their dynamic tongue movements, but among the Mitsuya et al.
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 129.100.253.75 On: Mon, 20 Jul 2015 14:31:47
monophthongs they examined, genioglossus activity for /i/ and /u/ was noticeably larger (their EMG level was approximately 230 lV for /i/ and 140 lV for /u/) compared to other monophthongs (60–80 lV). Moreover, the sagittal plane trajectories of the tongue dorsum showed that /i, I, and u/ had much more dynamic (larger) movements compared to /O/, the trajectory of which was well within that of a schwa. These articulatory data strongly suggest that the tongue engagement of two closed vowels /i/ and /u/ (which require tongue blade constriction) was relatively large. Taken together, it is possible that oral sensations alone are rich enough to specify production control of high vowels. A difference in the strength and richness of somatosensory information may inform the way auditory feedback is used for speech production control. Thus different compensatory production may be observed if each modality of sensory feedback contributes to speech production control differently. The majority of formant perturbation studies have examined only the English vowel /E/ with only a few exceptions (Chinese triphthong /iau/: Cai et al., 2010; continuous vowel segments: Cai et al., 2011; English /ˆ/ and /@/: Reilly and Dougherty, 2013). It is unknown whether compensatory production in response to formant perturbation is homogenous or heterogeneous across vowels. The aim of the current study was to compare compensation magnitude when F1 of different English monophthongs was perturbed in real time. II. METHODS A. Participants
Two hundred forty female speakers from the community of Western University in Canada participated in the current study. Their ages ranged from 17 to 34 yr with mean of 22.9 yr and a standard deviation of 3.2 yr. All but three speakers acquired Canadian English as their first language, most of whom (204 speakers) grew up in Southern Ontario. Sixteen speakers came from Northern Ontario, 17 from other Canadian provinces, two of whom were from Montreal and two from Maritime provinces. No speakers were from Newfoundland and Labrador. The remaining three were from the U.S. (two from the Midwest and one from California). Canadian English is thought to be broadly homogenous from Ottawa in Ontario to British Columbia with more regional variation from Montreal eastward (Labov et al., 2005). Some aspects of the Canadian vowel space are known to be in transition (Boberg, 2010) but this large sample represents a homogeneous sample of English. These speakers were randomly assigned to one of 12 conditions [6 vowels (/i, I, E, æ, O, u/) ! 2 directions of manipulation (increased vs decreased F1)], consisting of 20 speakers each. It is important to note that (1) the proportion of speakers from southern Ontario was largest in all conditions and (2) there was otherwise no unequal or unbalanced number of speakers from one particular region in any of the conditions. All had normal hearing thresholds within the range of 500–4000 Hz [ 0.05] but more importantly, there was an interaction of the factors [F(5,228) ¼ 9.33, p < 0.05]. A number of post hoc comparisons could be performed; however, the most notable condition that contributed to the interaction was /i/ in the decrease direction. An independent sample t-test revealed that the /i/ decrease condition (74.14 Hz) had a significantly smaller magnitude than the /i/ increase condition [161.47 Hz; t(38) ¼ 4.04, p < 0.05]. By removing the /i/ vowel from the same ANOVA (vowel ! direction), the interaction was not significant [F(4,190) ¼ 1.64, p > 0.05] while the main effect of vowel remained significant [F(4,190) ¼ 8.42, p < 0.05] and the effect of direction remained not significant [F(1,190) ¼ 3.23, p > 0.05]. Post hoc analyses with Scheff!e method revealed that both /I/ (187.62 Hz) and /e/ (180.73 Hz) had significantly larger magnitudes compared to both /æ/ (154.17 Hz) and /O/ (150.89 Hz), while /u/ (174.82 Hz) did not differ from other vowels. Differences in the actual perturbation speakers received across vowel conditions might be important in determining Mitsuya et al.
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 129.100.253.75 On: Mon, 20 Jul 2015 14:31:47
FIG. 3. (Color online) Average magnitude of observed perturbation for the first formant (a) increase and (b) decrease condition. The vertical line denotes the boundary between the ramp and hold phases.
how much they compensated. To test this, we calculated Pearson’s correlation coefficient between each condition’s average actual perturbation magnitude and compensation magnitude. For this analysis, the negative values of (1) observed perturbation magnitude in the decrease conditions and (2) compensation magnitude in the increase conditions were multiplied by "1, so that all 12 conditions were examined at once. The correlation was not significant [r(12) ¼ 0.45, p > 0.05]. Thus differences in the actual perturbation groups received did not drive the group differences in compensation magnitude. The F1 frequencies in the /i/ decrease condition were not successfully perturbed because the F1 values were already very low. To illustrate, with their female F0 of 212.77 Hz (s.e. ¼ 3.85 Hz) and F1 of 338.40 Hz (s.e. ¼ 9.93 Hz), lowering F1 by 200 Hz would result in the perturbed F1 being lower than most speakers’ F0. As can be seen in Fig. 3(b), mean perturbation was similar to other vowels until approximately the 65th trial, a "100 Hz intended perturbation, for which the first harmonic could have been amplified by the perturbation. However, a larger intended perturbation would have diminishing effects on existing harmonics as the upper skirt of the filter (created by the pair of spectral poles) rolls off with the filter peak moving to frequencies lower than F0.
Based on the correlation result in the preceding text, differences in compensation magnitude across vowels were not explained by perturbation magnitudes. We further examined whether compensation magnitude was related to closedness of the vowels. The classification of “closed” versus “open” vowels was based on the conventional linguistic typology of jaw openness. Although rich lingual and oral sensation associated with “closed vowels” may not be directly related to jaw openness, the presumed strength of such sensation differentiates the two groups of vowels nonetheless. All conditions, except for /i/ in the decrease condition, were grouped into “closed” (/i/ increase and /I and u/ of both directions) versus “open” (/E, æ, O/ of both directions) vowels, based on the conventional linguistic typological classification of vowels (International Phonetic Association, 1999). Compensation magnitude was then submitted to an ANOVA with vowel closedness (closed vs open) and perturbation direction (increase vs decrease) as between subject factors. Results revealed that open vowels compensated significantly more [X ¼ 42.06 Hz, s.e. ¼ 3.94; F(1,216) ¼ 15.73, p < 0.05] than closed vowels (X ¼ 23.68 Hz, s.e. ¼ 3.31). Moreover, the increase condition had a significantly larger compensation [X ¼ 38.68 Hz, s.e. ¼ 3.46; F(1,216) ¼ 6.98, p < 0.05] than the decrease condition (X ¼ 27.73 Hz, s.e. ¼ 3.94). However, the interaction was not significant [F(1,216) ¼ 0.88, p > 0.05]. Compensation magnitude is a good index of how speakers responded to the perturbation they received but, as Figs. 3(a) and 3(b) show, the actual perturbation magnitude they received may have been somewhat less than intended. It is important to estimate the production change caused by the actual perturbation speakers received. The proportional change in production was calculated by taking the normalized F1 value of a given trial and dividing by the prior trial’s difference between the feedback signal and the microphone signal. We used this definition based on the finding of the lag-one autocorrelation of trial-to-trial F1 changes (Purcell and Munhall, 2006b). This measurement reflects how much the speaker changed their production from baseline in response to the perturbation they actually received. Averages of proportional change in production for each vowel condition can be seen in Figs. 4(a) (increase condition) and 4(b) (decrease condition). In both directions, the first 10 trials of the ramp phase (trials 41–50) show quite variable proportional change. This was due to the fact that naturally variable production was divided by a small perturbation magnitude. However as the perturbation becomes larger, a more stable pattern emerged, seen in Figs. 4(c) (increase condition) and 4(d) (decrease condition), which magnifies the portion of Figs. 4(a) and 4(b) during the hold phase (i.e., trials 91–110). To quantify this measurement, the hold phase’s proportional
TABLE I. Average observed perturbation in hertz magnitude across all conditions. s.e., standard error.
Decrease (s.e.) Increase (s.e.)
/i/
/I/
/E/
/æ/
/O/
/u/
Mean
"74.14 (19.30) 161.16 (9.53)
"190.95 (4.37) 184.29 (5.05)
"181.52 (5.37) 179.94 (3.80)
"149.30 (8.05) 159.04 (8.93)
"161.98 (9.23) 139.80 (8.10)
"187.08 (10.85) 162.56 (11.44)
"157.49 (9.53) 164.47 (8.03)
J. Acoust. Soc. Am. 138 (1), July 2015
Mitsuya et al.
417
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 129.100.253.75 On: Mon, 20 Jul 2015 14:31:47
FIG. 4. (Color online) Average change in the first formant (F1) production based on the actual F1 perturbation received in the preceding trial in F1 increase (a) and decrease (b) conditions. The vertical line denotes the boundary between the ramp and hold phases. The same figures with magnified y axis in F1 increase (c) and decrease (d) conditions.
change was averaged for each vowel condition, which is summarized in Table II. As can be seen, proportional changes among the closed vowels were generally smaller than that of the open vowels. We submitted proportional change values from the hold phase for each vowel and direction condition (except /i/ decrease condition) to ANOVA with vowel closure (closed vs open) and perturbation direction (increase vs decrease) as between-subjects factor. The main effect of vowel closure was significant [F(1,7) ¼ 8.62, p < 0.05] with open vowels having a larger proportional compensation (X ¼ 26.57%, s.e. ¼ 3.50%) than closed vowels (X ¼ 13.46%, s.e. ¼ 4.54%). However, neither perturbation direction [F(1,7) ¼ 2.79, p > 0.05] nor the interaction of the two factors [F(1,7) ¼ 0.52, p > 0.05] was significant. Group averages of proportional compensation across closed versus open vowels can be seen in Figs. 5(a), and 5(b) (with
a reduced y axis range to show the difference between the conditions). The current design manipulated F1 in Hertz with a fixed perturbation magnitude (i.e., 6200 Hz), thus perceptual consequences of the perturbation were not homogenous across conditions; this might have affected the ways in which our subjects compensated. To take this into account, for each subject, we transformed the baseline average F1 and the value þ200 Hz (for the subjects in the increase condition) or "200 Hz (decrease condition) into the Mel scale, then calculated the difference. This value gives us the perceptual consequence of the perturbation. The condition averages of these values are summarized in Table III. Similarly, compensation magnitude was also transformed to Mel by subtracting the baseline average in Mel from the hold phase average in Mel. However, this value still reflects different perceptual
TABLE II. Average proportional change in hertz in first formant (F1) production in response to the perturbation speakers actually received. s.e., standard error. Closed vowel Direction Decrease (s.e.) Increase (s.e.)
418
Open vowel
/i/
/I/
/u/
Mean
Direction
/E/
/æ/
/O/
Mean
19.67 0.72
2.27 0.60 25.35 0.87
10.68 0.42 9.32 1.00 Total mean (s.e.)
6.48 0.37 18.11 0.50 13.46 0.37
Decrease (s.e.) Increase (s.e.)
23.79 0.92 41.55 1.00
27.70 1.35 24.93 1.16
21.31 0.94 20.12 1.33 Total mean (s.e.)
24.27 0.69 28.87 0.61 26.57 0.43
J. Acoust. Soc. Am. 138 (1), July 2015
Mitsuya et al.
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 129.100.253.75 On: Mon, 20 Jul 2015 14:31:47
TABLE IV. Proportional magnitude of compensation in Mel scale. s.e., 1 standard error. /i/
/I/
/E/
/æ/
/O/
/u/
Mean
Decrease "0.68 1.99 19.53 16.25 14.54 8.80 12.22 (s.e.) 1.92 5.26 3.04 4.34 3.55 2.59 3.76 Increase "18.17 "25.59 "41.69 "21.05 "16.04 "8.31 "21.81 (s.e.) 5.31 3.64 3.92 4.07 4.06 3.72 4.12
FIG. 5. (Color online) (a) Average change in the first formant production based on the actual F1 perturbation received in the preceding trial across closed and open vowels. The vertical line denotes the boundary between the ramp and hold phases. (b) The same figure shown with magnified y axis. Shading around the lines indicate 1 standard error.
consequences of perturbation magnitude; thus, we divided the compensation magnitude in Mel by the perturbation magnitude in Mel for each subject to normalize the data; this indicates the proportional change of formant values in Mel. The condition averages are summarized in Table IV. The values of proportional change based in the Mel scale were submitted (except for the /i/ decrease condition) to an ANOVA with vowel closedness and perturbation direction as between-subjects factors. Both vowel closedness [F(1,216) ¼ 14.99, p < 0.05] and perturbation direction [F(1,216) ¼ 16.77, p < 0.05] were significant. Open vowels had larger compensation (X ¼ 21.52 Mel, s.e. ¼ 1.76) than closed vowels (X ¼ 12.57 Mel, s.e. ¼ 2.05), and the increase condition elicited a larger compensation (X ¼ 21.81 Mel, s.e. ¼ 1.92) than the decrease condition (X ¼ 12.22 Mel, s.e. ¼ 1.82). Interaction of the two factors was not significant [F(1.216) ¼ 0.22, p > 0.05)]. These results are essentially identical to that of compensation magnitude analyzed in raw hertz. TABLE III. Average magnitude of perturbation in Mel across conditions. s.e., 1 standard error.
Decrease (s.e.) Increase (s.e.)
/i/
/I/
/E/
/æ/
/O/
/u/
Mean
241.66 2.69 197.75 1.85
195.36 2.18 167.99 1.24
172.26 1.68 149.67 1.47
151.04 1.26 132.64 1.30
160.98 1.57 140.22 1.21
225.52 2.30 186.96 1.27
191.14 1.95 162.54 1.39
J. Acoust. Soc. Am. 138 (1), July 2015
The threshold of compensation was also compared across conditions. As in Mitsuya et al. (2013), each speaker’s threshold of compensation was determined as the first trial of three consecutive productions in which the speaker’s F1 baseline average was exceeded by 3 standard errors in a direction opposite to the perturbation. With this definition, those who (1) changed their formant value in the same direction as the perturbation and (2) had more variable baseline production, did not yield a threshold point (see Table V). A chi-square test of goodness of fit revealed that there was no difference in the number of speakers who yielded a threshold across experimental conditions (v2 ¼ 1.44, df ¼ 11, p > 0.05). Based on this threshold point, we then estimated the actual perturbation each speaker had received in the three preceding trials by averaging them. Speakers in decrease conditions yielded a negative value for this measure, and those values were multiplied by "1 to compare both directions simultaneously. The estimated perturbation magnitudes to reach a production threshold were compared between closed (X ¼ 62.18 Hz, s.e. ¼ 5.68 Hz) vs open vowels (X ¼ 71.24 Hz, s.e. ¼ 4.82 Hz) with an independent sample ttest. The result was not significant [t(212) ¼ 1.22, p > 0.05]. To see if threshold of compensation was explained by absolute F1 value, we performed a regression analysis, predicting threshold magnitude based on each speaker’s baseline average of F1, and it was not significant [R2 ¼ 0.08, F(1, 212) ¼ 1.67, p > 0.05]. That there was no difference in the magnitude of auditory perturbation needed to elicit compensatory production across closed versus open groups suggests that both groups were equally sensitive to auditory perturbation. In the current study, thresholds were defined based on the variability of baseline productions. Therefore vowels with less production variability would yield a smaller compensation threshold, assuming that an auditory error is defined with reference to production variability. An independent sample t-test was conducted to compare variability (standard deviation) of the baseline production across closed versus open vowels, and the result was significant [t(238) ¼ 6.73, p < 0.01] showing that closed vowels (X ¼ 19.04 Hz, s.e. ¼ 0.98 Hz) had smaller production variability than open vowels (X ¼ 30.71 Hz, s.e. ¼ 1.43 Hz). This result suggests that even though speakers’ compensation thresholds TABLE V. The number of speakers who yielded a threshold point in each condition.
Decrease Increase
/i/
/I/
/E/
/æ/
/O/
/u/
16 19
15 19
19 20
19 18
17 16
18 18
Mitsuya et al.
419
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 129.100.253.75 On: Mon, 20 Jul 2015 14:31:47
were not different between closed versus open vowels due to smaller production variability, the “production error” of closed vowels was larger at the threshold point, compared to open vowels. As reported by Kewly-Port and Watson (1994), the magnitude of the perceptual threshold for formant discrimination depends on the vowel’s formant value; that is, the smaller the formant value is, the smaller the discrimination threshold tends to be. The production data of the current study showed that there was a positive linear relationship between the baseline F1 value and its variability [r(240) ¼ 0.42, p < 0.05]; this mirrors the perception threshold of formant discrimination. However, compensation threshold was not consistent with such a relationship. Taken together, it is implied that closed vowels were less sensitive to auditory feedback than open vowels in hertz. It is possible that compensation magnitude across conditions was related to speakers’ baseline formant values. That is, within each condition, compensation might have been correlated with how high or low speakers’ naturally produced F1. To test this, Pearson’s correlation coefficient was calculated between speakers’ F1 baseline average, and their compensation magnitude in each condition. Except for /i/ in the increased condition [r(20) ¼ "0.46, p < 0.05], none was significant (all p > 0.05), indicating that generally, absolute F1 within a vowel category was unrelated to the production response to altered auditory feedback. Speakers’ F1 and F2 values collected across seven vowels during the model order estimation phase were compared across all 12 conditions to confirm that speakers in the conditions had similar vowel formant structure. For F1 and F2 values separately, two mixed design ANOVAs were conducted with vowels /i, I, e, E, æ, O, u/ as a within-subject factor, and experimental condition (12; 6 vowels ! 2 directions) as a between-subject factor. For both F1 and F2, Mauchly’s test of sphericity was significant (p < 0.05), thus Greenhouse–Geisser adjustments were used for the omnibus analysis. For both F1 and F2, the main effect of vowel was significant [F1: F(3.15,717.59) ¼ 6896.65, p < 0.05; F2: F(2.10,477.35) ¼ 6075.51, p < 0.05], while condition was not significant [F1: F(11,228) ¼ 0.78 p > 0.05; F2: F(11,228) ¼ 0.76, p > 0.05]. More importantly, the interaction of vowel and condition was not significant [F1: F(66,1368) ¼ 1.12, p > 0.05; F2: F(66,1362) ¼ 1.27, p > 0.05]. These results indicate that speakers across conditions had similar formant values for each of the seven vowels collected during model order estimation. Testing each vowel’s formants as independent data points is conventional; however, this method may not necessarily take into account the shape of an individual’s vowel space (a vowel constellation). For example, the mean and variability of formant values of a given vowel do not take into account each value’s relationship to those of other vowels [see Figs. 6(a) and 6(b)]. That is, similar average formant values of vowels across conditions do not necessarily indicate the shape of speakers’ vowel constellations across conditions are also similar; this might have contributed to group differences in compensatory formant production. To test whether shapes of vowel constellations were different across conditions, we decided to use Procrustes analysis. Briefly, this analysis superimposes a comparison shape 420
J. Acoust. Soc. Am. 138 (1), July 2015
and a reference shape at the center of mass (centroid) of the shapes, then re-scales the shapes so that the sum of distances to each point from the centroid is 1. After this normalization process, the comparison shape is rotated at the centroid so that the sum of squared differences of the corresponding vowels in the two shapes is minimized. This minimum product is used as a “dissimilarity index.” Our reference vowel constellation was the mean normalized shape where all individual shapes were superimposed at the centroid and re-scaled. The dissimilarity index was calculated for each speaker’s vowel constellation, then the values were submitted to one-way ANOVA with the 12 vowel conditions as a between-subject factor. The result was not significant [F(11,228) ¼ 1.06, p > 0.05], indicating that speakers in all conditions had a similar vowel space, independent of the size of their space in the F1/F2 acoustic space. To verify the sensitivity of this analysis, the dissimilarity index of English vowel spaces produced by native English speakers and native Japanese speakers in the study of Mitsuya et al. (2011) were compared. Using the reference shape generated from the present study, based on 240 native English speakers, a dissimilarity score was calculated for each speaker. An independent sample t-test revealed that the dissimilarity scores among native English speakers were significantly
FIG. 6. (Color online) Hypothetical formant values of three vowels produced by three speakers. The dashed lines connecting the vowels indicate each speaker’s vowel space shape. While both (a) and (b) were created based on the same formants, the vowel spaces differ in shape as points are assigned to different speakers: (a) shows more shape variability while (b) shows similar shapes. Mitsuya et al.
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 129.100.253.75 On: Mon, 20 Jul 2015 14:31:47
smaller (X ¼ 0.027, s.e. ¼ 0.006) than the native Japanese group (X ¼ 0.226. s.e., ¼ 0.018). The overall vowel space of our speakers is plotted in Fig. 7(a), which displays elliptical production categories for vowels in the F1/F2 acoustic space. These ellipses were created with an assumption that F1 and F2 were normally distributed. Ellipses are tilted due to covariation of F1 and F2. Although this method of category estimation has been used in other studies (MacDonald et al., 2010; MacDonald et al., 2011; Mitsuya et al., 2011; Mitsuya et al., 2013), the assumption of normality had not been tested. Because we collected a large set of production data including seven English vowels from 240 speakers for model order estimation, we also analyzed their vowel space using a two-dimensional (F1/F2) histogram. Figure 7(b) shows vowel categories with interpolated contours based on the two-dimensional (2D) histogram. The darkest area within each category indicates where most of the data is clustered and is very similar to the vowel space estimated using normal distributions as in past studies [Fig. 7(a)]. To examine the distribution of formant values for skewness, we used the most commonly used measure c1. As summarized in Table VI, c1 values for F1 of /e/ and F2 of /O/ and /u/ were significantly different from the normal distribution value, 0. However, the actual c1 values for these formants were rather small, considering that c1 can range
FIG. 7. (Color online) (a) Average vowel space of 240 speakers of the present study. The centroid of each ellipse represents the average F1/F2 value for that vowel. The solid and dashed ellipses represent one and two average standard deviations, respectively. (b) Seven vowel categories on the two-dimensional histogram. Interpolated lines were drawn within each distribution. J. Acoust. Soc. Am. 138 (1), July 2015
approximately from "3 to þ3. Thus even though their distributions were not normal, they were not severely skewed. IV. DISCUSSION
The idea that speech production is represented in both auditory and somatosensory domains has been proposed previously (e.g., Guenther, 1995; Purcell and Munhall, 2006a; Nasir and Ostry, 2006; Trembley et al., 2008; Houde and Nagarajan, 2011). Partial compensation of auditory perturbations has been hypothesized to be due to these two types of feedback signifying different information. The sensorimotor system then somehow integrates that differing information into a production response. Examinations of whether the speech production system processes feedback information differently from each modality have been conducted previously; yet to our knowledge, it has only been tested with a few different vowels (Lametti, et al., 2012; Feng et al., 2011). Differences in sensory contribution were estimated in Lametti et al. (2012) and Feng et al. (2011) using how much speakers compensated and are specific to the contrasts between the vowels tested. Consequently, it is not known whether auditory and somatosensory feedback are homogeneously processed across the entire vowel space. The current study investigated compensatory formant production across many vowels. Our results suggest that processing of auditory feedback for production control is heterogeneous across different vowels; thus we postulate a hypothesis that somatosensory feedback is weighed differently depending on vowel type, such that (1) rich somatosensory information may reliably specify articulatory postures of closed vowels and consequently (2) auditory feedback may be weighed less for controlling the production of these vowels. Differences in feedback weight are a plausible explanation to account for the current data, yet, there are puzzling findings as well. The difference in compensation magnitude between closed versus open vowels was significant over all, but in the increase condition, this difference might have been mostly due to the large difference between /E/ and /u/. The other vowels in the increase condition overlap substantially [as readily seen in Fig. 4(c)]. The hypothesis about somatosensory sensitivity due to lingual contact of the closed vowels does not explain this pattern, and we do not have a clear explanation. The fact that the increase condition in general showed slightly statistically larger compensation magnitudes might indicate there is something specific about this direction of manipulation. One can also argue that the current data pattern may be explained by other accounts. We would like to briefly consider some of those possibilities. First, the auditory and somatosensory feedbacks may have a similar weight, but closed vowels may have more active somatosensory inputs. Although this account cannot be teased apart from the hypothesis of different feedback weights, we have shown that compensatory behavior across vowels is not homogeneous, and the pattern we observed is reasonably explained with a different influence of auditory versus somatosensory feedback. Further investigations are needed to address these issues, such as by using simultaneous articulatory tracking. Mitsuya et al.
421
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 129.100.253.75 On: Mon, 20 Jul 2015 14:31:47
TABLE VI. Skewness index, c1, of the first and second formants’ (F1 and F2, respectively) distributions. z scores of c1 for seven vowels collected from our 240 speakers were calculated based on standard error for skewness, SS, of 0.158. /i/
c1 z a
/I/
/e/
/E/
/æ/
/O/
/u/
F1
F2
F1
F2
F1
F2
F1
F2
F1
F2
F1
F2
F1
F2
"0.06 "0.38
"0.11 "0.70
0.13 0.82
0.26 1.65
0.46 2.91a
"0.11 "0.70
"0.12 "0.76
"0.09 "0.57
"0.28 "1.77
"0.18 "1.14
"0.29 "1.83
0.46 2.91a
0.18 1.14
0.34 2.15a
p < 0.05.
Second, compensation differences between closed versus open vowels might have been an artifact of an inherent difference in perception, such as sensitivity to and discriminability of perturbed feedback. Recall that the threshold of compensation was similar between closed versus open vowels while production variability was smaller among closed vowels. Thus we concluded that the closed vowels are less sensitive to auditory feedback. However, it is possible that the sensitivity to auditory feedback was low only because smaller F1 values tend to have small variability. Therefore it might have little to do with closedness per se. Beckman et al. (1995) indeed reported that open vowels tend to have slightly larger production variability; therefore articulatory targets might also be slightly larger than for closed vowels. In the current study, F1 variability was considerably lower among closed vowels compared to open vowels. Although scaling plays a role, it is not the only contributor to the variability difference between closed versus open vowels. That is, small formant values do not necessarily have small variability. For example, in the study by Mitsuya et al. (2011), five Japanese vowels /i, < e , a, o, !/ were collected and F1, F2 and their variability were calculated. While F1 of /o/ was much smaller than that of /< e /, its variability was much larger. Similarly, F1 of /!/ was slightly higher than that of /i/, but the variability of /!/ was smaller than /i/. Variability could also be due to how an underlying representation of vowel category is defined. The current study cannot tease apart the weighting hypothesis from the scaling hypothesis. A further study of compensatory behavior of the five Japanese vowels might be useful to disentangle the scaling confound. We did not measure discriminability of perturbed feedback, thus, it is not possible to account for the factor. However, we can speculate on a related point. Mitsuya et al. (Mitsuya et al., 2011; Mitsuya et al., 2013) have reported that compensation is related to the perceptual quality of the vowel being produced. That is, if the feedback sound is perceived to be less prototypical of the intended vowel, a larger compensation is observed. Because perturbation magnitude was uniform across conditions in hertz (i.e., 200 Hz increase or decrease), the perceptual consequence of the perturbation was unique to each condition because the perception of vowel category is not linear in hertz. Discriminability is related to noticeability. That is, a more readily noticeable difference is more discriminable by definition or vice versa. In the current experiment, we observed larger compensation among the open vowels relative to the closed vowels. However, when we transformed baseline F1 values and perturbed F1 values into the Mel scale, the amount of perceptual change due to the perturbation was 422
J. Acoust. Soc. Am. 138 (1), July 2015
larger among the closed vowels. Moreover, expressing proportional compensation magnitude for each speaker in Mel (by subtracting baseline F1 in Mel from the hold average in Mel and dividing by the perturbation magnitude in Mel), the closed vowels still showed smaller proportional compensation magnitude compared to open vowels. If participants’ compensation behavior was dominated by perceptual differences, proportional compensation magnitude in Mel should have been larger for closed vowels. However, this was not the case. Therefore we do not believe that compensation behavior is driven solely at the auditory or perceptual level, and maintain our postulation that another feedback system (i.e., somatosensory) is contributing to the differences. Third, the difference in compensatory production across vowels might have been related to each vowel’s unique articulatory posture and the feasibility of articulatory changes (i.e., motoric allowance to compensate). But if so, we should expect certain patterns across directions of perturbation. With the lowering F1 perturbation, closed vowels should have more motoric room to compensate, whereas with already widely open /æ/, there may not be enough room for compensation. In the opposite case, increasing F1 feedback should elicit lowering F1 production; this is correlated to jaw closing; thus, closed vowels would exhibit less compensation compared to the other direction, while with an open vowel /æ/, slightly closed posture is easily achievable. These patterns were not observed in the current data. However, because of a nebulous relationship between articulatory configurations and the corresponding formant structures, it is impossible for us to account for this relationship without actual measurement of movements of the articulators. This is especially true for the corner vowels (e.g., see quantal theory: Stevens, 1989), for which changes in articulatory configuration may not result in much difference in acoustics. The current study did not measure articulatory movements in response to auditory perturbation, thus no clear explanation can be offered. Few studies have examined the effect of formant perturbation on articulatory maneuvers such as Neufeld et al. (2013), using electromagnetic articulography. Further studies with such measurement will be useful to further understand the acoustic-articulatory relationship in the context of speech compensation. Understanding how information from various modalities interplays is crucial for a comprehensive speech production model. However, many aspects of production processes are still not well understood. For example, there is no agreement about whether the unit of processing is the same across sensory modalities. In Hickok’s Hierarchical State Feedback Control (HSFC) model of speech production (Hickok, 2012, Mitsuya et al.
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 129.100.253.75 On: Mon, 20 Jul 2015 14:31:47
2014), there is a functional division of auditory and articulatory processes, each of which has a slightly different unit of processing. For the auditory network, the syllable functions as a target, whereas for articulatory feature control, the phoneme is the functional unit (Hickok, 2012, 2014). Thus according to this proposal, auditory feedback maintains syllabic integrity while somatosensory feedback controls fine articulatory trajectories at the phonemic level. Hickok argues that because vowels are both a syllable (as a nucleus without a consonantal onset and a coda) and a phoneme simultaneously, both types of feedback are processed with varying weights to control the production (Hickok, 2012). On the other hand, Sato et al. (2014) argue that speech production units are at the phonemic level and the production target is represented multimodally—both in the auditory and somatosensory domains. This proposal is more parsimonious in terms of levels of processing and the nature of the representation of speech production than those of Hickok because there is no functional division between syllable and phoneme, or a change in dominant feedback type depending on the status of the unit. In the context of compensatory production of formants, both proposals seem to be able to explain the heterogeneous compensation observed in the current study. Further investigations are needed to identify the nature of speech production targets. Formant compensation is reported to be automatic (Munhall et al., 2009) and typically many speakers do not report that they noticed feedback was perturbed (i.e., a different vowel quality) especially with incremental perturbations like the present and similar previous studies (e.g., Purcell and Munhall, 2006b). This lack of awareness of perturbation occurs despite the fact that the feedback participants received (even with compensatory production) was still far from the formant structure of their normal baseline, and likely crossed a neighboring categorical boundary. Even when speakers noticed the exact nature of the perturbation (i.e., vowel quality changing), this awareness does not make a difference in compensation magnitude (Mitsuya et al., 2013). However, the level of (un)awareness of the perturbation could differ with more closed vowels. If somatosensory feedback is what speakers use to control the production of closed vowels, and consequently they do not compensate much for errors in auditory feedback, the auditory feedback they receive would more clearly be out of the production category of the intended vowel. It is still not well understood how and why such large acoustic differences, which would have a different perceptual status while listening, are not readily noticed by speakers. An additional interesting observation from our data is the consistent partial compensation throughout ramp and hold phases once the threshold is achieved. A linear relationship between perturbation magnitude and compensation magnitude was carefully examined in MacDonald et al. (2010), where speakers received small incremental perturbations that well exceeded the maximum perturbation applied in the current study. The speakers in the study by MacDonald et al. compensated for the perturbation of /E/ approximately at the rate of 25% until about 200 Hz, where compensation asymptoted (MacDonald et al., 2010). This J. Acoust. Soc. Am. 138 (1), July 2015
pattern was consistent with the current study [Figs. 4(a) and 4(b)] where a proportional change in formant compensation showed low variability across the ramp phase. Slightly different results were obtained by Katseff et al. (2012), where they reported that small perturbations (e.g., 50 Hz with /E/) elicited near full compensation. In the current study, approximately 50 Hz perturbation was applied for trials 52–53; however, around those trials, average proportional compensation for the vowel /E/ was well below 100%. Many studies of real-time formant perturbation, once again, have been conducted with the vowel /E/. The reason why this particular vowel has been repeatedly used may be because (1) F1 and F2 are sufficiently far apart so that manipulation of one formant has little effect on the other, (2) because its F1 value is not low or high, it allows either perturbation direction, and (3) it is a mildly open vowel, so it affords the speaker to compensate easily in either direction. However, conclusions on production control and its representation drawn from those studies may be specific to this vowel. Other vowels may have very different control representation from that of /E/. Generalization of one vowel’s control representation to that of other vowels assumes that representation of vowels is homogeneous across many parameters. Without examining other vowels and the relative relationship between vowels thoroughly, our understanding of the speech control mechanism may be severely warped, especially as the current study revealed. Each vowel may be represented differently across sensory feedback types. In conclusion, the current data suggest that the sensorimotor control of vowels is not fixed across sensory modalities. Smaller acoustic compensation with closed vowels suggests that somatosensory feedback for those vowels may richly specify their production and that contribution of different feedback types (auditory versus somatosensory) is heterogeneous across different vowels. Continued study of this phenomenon is needed (1) to disentangle the acoustic- articulatory relationship by examining the articulatory compensation induced by the auditory perturbation the current study employed and (2) to identify how such information is processed into a coherent production plan and perceptual unit as well as the mechanism of perception during articulation. ACKNOWLEDGMENTS
We would like to thank Mark Tiede at Haskins Laboratories for use of his offline F0 and formant estimation tools and Gabriella Lesniak, Bonnie Cooke, Samidha Joglekar, Lauren Vannus, Emily Barrett, Lindsay Gibson, and Victoria Stone for their assistance in data collection and analyses. This research was supported by a Canada Foundation for Innovation (CFI) grant awarded to the National Centre for Audiology at Western University, the Natural Sciences and Engineering Research Council of Canada (NSERC), and a Ministry of Research and Innovation (Ontario) Early Researcher Award (MRI ERA) awarded to D.W.P. Abbs, J. H., and Gracco, V. L. (1984). “Control of complex motor gestures: Orofacial muscle responses to load perturbations of lip during speech,” J. Neurophysiol. 51, 705–723. Mitsuya et al.
423
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 129.100.253.75 On: Mon, 20 Jul 2015 14:31:47
Alfonso, P. J., and Baer, T. (1982). “Dynamics of vowel articulation,” Lang. Speech. 25, 151–173. Bauer, J. J., Mittal, J., Larson, C. R., and Hain, T. C. (2006). “Vocal responses to unanticipated perturbations in voice loudness feedback,” J. Acoust. Soc. Am. 119, 2363–2371. Beckman, M. E., Jung, T. P., Lee, S. H., de Jong, K., Krishnamurthy, A. K., Ahalt, S. C., Cohen, K. B., and Collins, M. J. (1995). “Variability in the production of quantal vowels revisited,” J. Acoust. Soc. Am. 97, 471–490. Boberg, C. (2010). The English Language in Canada: Status, History and Comparative Analysis (Cambridge University Press, Cambridge). Bouchard, K. E., Mesgarani, N., Johnson, K., and Chang, E. F. (2013). “Functional organization of human sensorimotor cortex for speech articulation,” Nature 495, 327–332.e. Burnett, T. A., Freedland, M. B., Larson, C. R., and Hain, T. C. (1998). “Voice f0 responses to manipulations in pitch feedback,” J. Acoust. Soc. Am. 103, 3153–3161. Cai, S., Ghosh, S. S., Guenther, G. H., and Perkell, J. S. (2010). “Adaptive auditory feedback control of the production of formant trajectories in the mandarin triphthong /iau/ and its pattern of generalization,” J. Acoust. Soc. Am. 128, 2033–2048. Cai, S., Ghosh, S. S., Guenther, F. H., and Perkell, J. S. (2011). “Focal manipulations of formant trajectories reveal a role of auditory feedback in the online control of both within-syllable and between-syllable speech timing,” J. Neurosci. 31, 16483–16490. Casserly, E. D. (2011). “Speaker compensation for local perturbation of fricative acoustic feedback,” J. Acoust. Soc. Am. 129, 2181–2190. Cowie, R. J., and Douglas-Cowie, E. (1983). “Speech production in profound postlingual deafness,” in Hearing Science and Hearing Disorders, edited by M. E. Lutman and M. P. Haggard (Academic, New York), pp. 183–230. Cowie, R. J., and Douglas-Cowie, E. (1992). Postlingually Acquired Deafness: Speech Deterioration and the Wider Consequences (Mouton De Gruyter, New York), 320 pp. Feng, Y.-Q., Gracco, V. L., and Max, L. (2011). “Intergratin of auditory and somatosensory error signals in the neural control of speech movements,” J. Neurophysiol. 106, 667–679. Folkins, J. W., and Abbs, J. H. (1975). “Lip and jaw motor control during speech: Responses to resistive loading of the jaw,” J. Speech Hear. Res. 18, 207–220. Folkins, J. W., and Zimmermann, G. N. (1982). “Lip and jaw interaction during speech: Responses to perturbation of lower-lip movement prior to bilabial closure,” J. Acoust. Soc. Am. 71, 1225–1233. Gracco, V. L., and Abbs, J. H. (1989). “Sensorimotor characteristics of speech motor sequences,” Exp. Brain Res. 75, 586–598. Guenther, F. H. (1995). “Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production,” Psychol. Rev. 102, 594–621. Hickok, G. (2012). “Computational neuroanatomy of speech production,” Nature Rev. Neurosci. 13, 135–145. Hickok, G. (2014). “The architecture of speech production and the role of the phoneme in speech processing,” Language, Cognit. Neurosci. 29, 2–20. Higgins, M. B., McCleary, E. A., and Schulte, L. (2001). “Articulatory changes with short-term deactivation of the cochlear implants of two prelingually deafened children,” Ear Hear. 22, 29–41. Houde, J. F., and Jordan, M. I. (1998). “Sensorimotor adaptation in speech production,” Science 279, 1213–1216. Houde, J. F., and Nagarajan, S. S. (2011). “Speech production as state feedback control,” Front. Hum. Neurosci. 5(82), 1–14. International Phonetic Association (1999). Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet (Cambridge University Press, Cambridge). Jones, J. A., and Munhall, K. G. (2000). “Perceptual calibration of F0 production: evidence from feedback perturbation,” J. Acoust. Soc. Am. 108, 1246–1251. Katseff, S., Houde, J., and Johnson, K. (2012). “Partial compensation for altered auditory feedback: A tradeoff with somatosensory feedback?,” Lang. Speech 55, 295–308. Kewly-Port, D., and Watson, C. S. (1994). “Formant-frequency discrimination for isolated English vowels,” J. Acoust. Soc. Am. 95, 485–496. Labov, W., Ash, S., and Boberg, C. (2005). The Atlas of North American English: Phonetics, Phonology and Sound Change (Walter de Gruyter, Berlin), pp. 216–221. Lametti, D. R., Nasir, S. M., and Ostry, D. J. (2012). “Sensory preference in speech production revealed by simultaneous alteration of auditory and somatosensory feedback,” J. Neurosci. 32, 9351–9358.
424
J. Acoust. Soc. Am. 138 (1), July 2015
Loucks, T. M., and De Nil, L. F. (2001). “The effects of masseter tendon vibration on nonspeech oral movements and vowel gestures,” J. Speech Lang. Hear. Res. 44, 306–316. MacDonald, E. N., Goldberg, R., and Munhall, K. G. (2010). “Compensation in response to real- time formant perturbations of different magnitude,” J. Acoust. Soc. Am. 127, 1059–1068. MacDonald, E. N., Purcell, D. W., and Munhall, K. G. (2011). “Probing the independence of formant control using altered auditory feedback,” J. Acoust. Soc. Am. 129, 955–966. Mitsuya, T., MacDonald, E. N., Purcell, D. W., and Munhall, K. G. (2011). “A cross-language study of compensation in response to real-time formant perturbation,” J. Acoust. Soc. Am. 130, 2978–2986. Mitsuya, T., Samson, F., M!enard, L., and Munhall, K. G. (2013). “Language dependent vowel representation in speech production,” J. Acoust. Soc. Am. 133, 2993–3003. Munhall, K. G., L€ ofqvist, A., and Kelso, J. S. (1994). “Lip-larynx coordination in speech: Effects of mechanical perturbations to the lower lip,” J. Acoust. Soc. Am. 95, 3605–3616. Munhall, K. G., MacDonald, E. N., Byrne, S. K., and Johnsrude, I. (2009). “Speakers alter vowel production in response to real-time formant perturbation even when instructed to resist compensation,” J. Acoust. Soc. Am. 125, 384–390. Nasir, S. M., and Ostry, D. J. (2006). “Somatosensory precision in speech production,” Curr. Biol. 16, 1918–1923. Neufeld, C., Purcell, D. W., and Van Lieshout, P. (2013). “Articulatory compensation to second formant perturbations,” POMA 19, 060097. Orfanidis, S. J. (1988). Optimum Signal Processing: An Introduction (McGraw-Hill, New York), 590 pp. Penfield, W., and Boldrey, E. (1937). “Somatic motor and sensory representation in the cerebral cortex of man as studied by electrical stimulation,” Brain 60, 389–443. Picard, C., and Olivier, A. (1983). “Sensory cortical tongue representation in man,” J. Neurosurg. 59, 781–789. Purcell, D. W., and Munhall, K. G. (2006a). “Compensation following realtime manipulation of formants in isolated vowels,” J. Acoust. Soc. Am. 119, 2288–2297. Purcell, D. W., and Munhall, K. G. (2006b). “Adaptive control of vowel formant frequency: Evidence from real-time formant manipulation,” J. Acoust. Soc. Am. 120, 966–977. Putnam, A. H., and Ringel, R. L. (1976). “A cineradiographic study of articulation in two talkers with temporarily induced oral sensory deprivation,” J. Speech Lang. Hear. Res. 19, 247–266. Reilly, K. J., and Dougherty, K. E. (2013). “The role of vowel perceptual cues in compensatory responses to perturbations of speech auditory feedback,” J. Acoust. Soc. Am. 134, 1314–1323. Sato, M., Schwartz, J. L., and Perrier, P. (2014). “Phonemic auditory and somatosensory goals in speech production,” Lang. Cognit. Neurosci. 29, 41–43. Schenk, B. S., Baumgartner, W. D., and Hamzavi, J. S. (2003). “Effect of the loss of auditory feedback on segmental parameters of vowels of postlingually deafened speakers,” Auris Nasus Larynx. 30, 333–339. Scott, C. M., and Ringel, R. L. (1971). “Articulation without oral sensory control,” J. Speech Hear. Res. 14, 804–818. Shaiman, S. (1989). “Kinematic and electromyographic responses to perturbation of the jaw,” J. Acoust. Soc. Am. 86, 78–88. Shiller, D. M., Sato, M., Gracco, V. L., and Baum, S. R. (2009). “Perceptual recalibration of speech sounds following speech motor learning,” J. Acoust. Soc. Am. 125, 1103–1113. Stevens, K. N. (1989). “On the quantal nature of speech,” J. Phonet. 17, 3–45. Tourville, J. A., Reilly, K. J., and Guenther, F. H. (2008). “Neural mechanisms underlying auditory feedback control of speech,” Neroimage 39, 1429–1443. Tremblay, S., Houle, G., and Ostry, D. J. (2008). “Specificity of speech motor learning,” J. Neurosci. 28, 2426–2434. Tremblay, S., Shiller, D. M., and Ostry, D. J. (2003). “Somatosensory basis of speech production,” Nature 423, 866–869. Villacorta, V. M., Perkell, J. S., and Guenther, F. H. (2007). “Sensorimotor adaptation to feedback perturbations of vowel acoustics and its relation to perception,” J. Acoust. Soc. Am. 122, 2306–2319. Waldstein, R. S. (1990). “Effects of postlingual deadness on speech production: Implications for the role of auditory feedback,” J. Acoust. Soc. Am. 88, 2099–2114.
Mitsuya et al.
Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 129.100.253.75 On: Mon, 20 Jul 2015 14:31:47