Ultrafest VIII
Preface ��������������������������������������������������������������������� 3 Program ������������������������������������������������������������������� 5 Program & Rooms at a Glance ������������������������������ 11 Building Plan ���������������������������������������������������������� 12 Keynotes & Talks ���������������������������������������������������� 14 Posters & Demos ��������������������������������������������������� 56
1
2
Preface This supplement contains the abstracts of the eighth edition of Ultrafest, which is held in Potsdam, Germany, , October 4 – 6, 2017. Ultrafest VIII continues a tradition that started in Haskins Laboratories in 2002 with a few scientists sharing interest in optimizing ultrasound imaging for linguistic research. From this rather spontaneous first meeting, Ultrafest has grown to become an international meeting gathering scientists, speech and language pathologists, engineers and students. Following the first meeting in Haskins Laboratories (USA), subsequent editions were held as follows: Ultrafest II, April 2004. University of British Columbia Ultrafest III, April 2005. University of Arizona Ultrafest IV, September 2007. New York University Ultrafest V, March 2010. Haskins Laboratories Ultrafest VI, November 2013. Queen Margaret University Ultrafest VII, December 2015. Hong Kong University In fifteen years, ultrasound imaging has become a popular technique for studying language and speech. It has advanced our understanding of the articulatory mechanisms from which speech originates and contributed to creating an interface between language representations and their physical implementations in speech. Ultrasound-imaging is now employed in many research domains, e.g., first and second language acquisition, cross-linguistic and dialectal research, endangered languages, speech disorders, speech dynamics, speech modeling, speech motor control. Given the growing number of ultrasound studies published in high-ranked journals, it is now unquestionable that ultrasound technique has transformed experimental phonetics and laboratory phonology. It does not only expand methodological possibilities, it also allows us to address new questions and revisit longstanding theoretical debates. Ultrafest has been a successful meeting since its first inception. It has provided a friendly and inspiring platform for scientists and professionals from various backgrounds to discuss their current research, engage in problem-solving discussions and develop new collaborations. Furthermore,
3
Preface Ultrafest has provided an opportunity for new generations of doctoral researchers working with ultrasound to be introduced to the community and engage in fruitful exchanges. As in previous editions, Ultrafest VIII continues to promote this approach. In addition to presenters, a number of scientists, speech and language pathologists and students not yet using ultrasound-imaging are joining us to explore more how their research can benefit from this technique. We are very delighted to host the eighth edition of Ultrafest and have Marianne Pouplier (LMU, Munich, Germany) and Khalil Iskarous (USC, Los Angeles, USA) as keynote speakers. We are looking forward to a productive meeting, The Ultrafest VIII team Conference organization The meeting includes oral presentations, poster presentations and handson demos. Poster presentations are valued equal to oral presentations. Because formal talks are sometimes not the most suitable way to present methodological research, we included hands-on demos to facilitate interactions with the audience and provide time for more questions than usually possible in a formal presentation scheme.
4
Program Wednesday – October 4, 2017 Registration
09:30 – 10:00
Opening Introductions
10:00 – 11:00
Keynote: Marianne Pouplier
We
08:30 – 09:30
Fr
11:00 – 11:30
Th
Cross-linguistic differences in coarticulation and their implications for phonotactics Coffee Break
Oral Session 1: Phonology & Phonetics 11:30 – 12:00
Shuwen Chen, Peggy Pik Ki Mok, Mark Tiede, Wei-rong Chen & Douglas H. Whalen Investigating the production of Mandarin rhotics using ultrasound imaging
12:00 – 12:30
Patrycja Strycharczuk, Donald Derrick & Jason Shaw Midsagittal correlates of tongue lateralization
12:30 – 13:00
Petroula Mousikou, Patrycja Strycharczuk, Alice Turk & James M. Scobbie Morphological effects on articulation
13:00 – 14:30
Lunch
Oral Session 2: Phonology & Phonetics 14:30 – 15:00
Amanda L. Miller (presented by Mary Beckman) C-V coarticulation in consonants with multiple lingual constrictions
15:00 – 15:30
Sawsan Alwabari Articulatory constraints and sensitivity to coarticulation of Arabic pharyngealization
15:30 – 16:00
Coffee Break
16:00 – 17:30
Poster & Demo Session 1
5
Program Oral Session 3: Cross-linguistic Research 17:30 – 18:00
Eleanor Lawson, Jane Stuart-Smith & Lydia Mills Using ultrasound to investigate articulatory variation in the GOOSE vowel in the British Isles
18:00 – 18:30
Stefano Coretta Vowel duration and tongue root advancement in Italian and Polish
18:30
Welcome Drink
Thursday – October 5, 2017 Oral Session 4: Methodological Research 10:00 – 10:30
Alan Wrench & Peter Balch A 3D biomechanical equilibrium model of the tongue
10:30 – 11:00
Steven M. Lulich Acquisition and Analysis of 3D/4D Ultrasound Recordings of Speech*
11:00 – 11:30
Coffee Break
Oral Session 5: Methodological Research 11:30 – 12:00
Douglas H. Whalen, Mark K. Tiede & Boram Kim Quantification of Pivots in Ultrasound Images
12:00 – 12:30
Alessandro Vietti, Alessia Pini, Lorenzo Spreafico & Simone Vantini Towards cross-subject comparison of tongue profiles. A Functional Data Analysis approach*
12:30 – 13:00
James M. Scobbie & Joanne Cleland Dorsal Crescents: Area and radius-based mid-sagittal measurements of comparative velarity*
* The talk is accompanied by a hands-on demonstration. For the abstract, see the "Keynotes & Talks" section.
6
Program 13:00 – 14:30
Lunch
Oral Session 6: Clinical Research Joanne Cleland, James M. Scobbie, Zoe Roxburgh, Cornelia Heyde & Alan Wrench
We
14:30 – 15:00
15:00 – 15:30
Th
UltraPhonix: Ultrasound Visual Biofeedback for the Treatment of Speech Sound Disorders Giovanna Lenoci & Irene Ricci
15:50
Fr
Coarticulation and stability of speech in Italian children who stutter and normally fluent children: an ultrasound study Bus Tour & Dinner
Friday – October 6, 2017 10:00 – 11:00
Keynote: Khalil Iskarous Dynamical Principles of Hydrostat Movement: Worm, Octopus Tongue
11:00 – 11:30
Coffee Break
Oral Session 7: First & Second Language Acquisition 11:30 – 12:00
Natalia Zharkova Changes in dynamic lingual coarticulation of /s/ between eleven and thirteen years old
12:00 – 12:30
Kevin D. Roon, Jaekoo Kang & Douglas. H. Whalen Using ultrasound feedback to aid in L2 acquisition of Russian palatalization
12:30 – 13:00
Thomas Kaltenbacher What the [ɫ] [ ɻ ] they doing? Ultrasound in the EFL pronunciation classroom
13:00 – 14:30
Lunch
7
Program 14:30 – 16:00
Poster & Demo Session 2
16:00 – 16:30
Coffee Break
Oral Session 8: Dialectal & Field Research 16:30 – 17:00
Jonathan Havenhill, Elizabeth C. Zsiga, One Tlale Boyer & Stacy Petersen Ultrasound as a tool for language documentation: The production of labio-coronal fricatives in Setswana
17:00 – 17:30
Andrea Calabrese & Mirko Grimaldi Micro-variation in a metaphonic system: Acoustic-articulatory evidence from the Tricase dialect
17:30
General Discussion & Closing Remarks
Poster & Demo Sessions Wednesday – October 4, 2017: Session 1 Poster (Room: S12) 01
Chiara Meluzzi , Chiara Celata, Scott Moisik & John Esling A lingual-laryngeal ultrasound view of aryepiglottic trills
02
Tamás Gábor Csapó, Andrea Deme, Tekla Etelka Gráczi & Alexandra Markó Comparison of distance measures in tongue contour traces of ultrasound images
03
Phil Howson An Ultrasound Investigation of Upper Sorbian Rhotics
04
Hannah King & Anqi Liu An ultrasound and acoustic study of the rhotic suffix in Mandarin
05
Catherine Laporte & Steven M. Lulich Towards automated segmentation of the moving tongue surface in 3D ultrasound
06
Lorenzo Spreafico, Anna Matosova, Alessandro Vietti & Vincenzo Galatà Development of two head-probe stabilization devices for speech research and applications
8
Program Mayuki Matsui
07
Tongue-root coordination for voicing in Russian palatalized and “velarized” stops Pertti Palo & Steve Cowen
08
We
Simultaneous Recording of Tongue Ultrasound and Oral Airflow Elisa Reunanen & Maija S. Peltola
09
Th
Articulatory movements and coarticulation in Finnish vowels Demo
Fr
Room: S13 | D1 (16:00 – 16:40), D2 (16:50 – 17:30) D1
Wei-rong Chen, Mark Tiede & Shuwen Chen An optimization method for correction of ultrasound probe-related contours to head-centric coordinates
D2
Clara Rodríguez & Daniel Recasens R scripts for displaying lingual trajectories using ultrasound data
Room: H02 | D3 (16:00 – 16:40), D4 (16:50 – 17:30) D3
Joan Ma & Alan Wrench Swallowing assessment by ultrasound analysis of hyoid, larynx and tongue timing patterns
D4
Gunnar Olafsson (Clarius Mobile Health) Presentation of Clarius C7 scanner
Friday – October 6, 2017: Session 2 Poster (Room: F2) 01
Lauretta Cheng, Murray Schellenberg, Colin Jones & Bryan Gick Cross-linguistic evidence for lateral bracing in speech
02
Joanne Cleland, Lisa Crampin, Alan Wrench, Natalia Zharkova, Susan Lloyd & Pertti Palo Visualising Speech A Protocol for using Ultrasound to Assess Cleft-Type Speech Characteristics
9
Program Eleanor Lawson
03
An ultrasound investigation of coda /r/ tongue gesture timing in spontaneous speech Scott Lewis, Esther de Leeuw & Erez Levon
04
An ultrasound investigation into gestural drift and identity in Merseyside English speakers living in London, UK Sam Kirkham & Claire Nance
05
The articulation of cross-linguistic vowel contrasts in bilinguals: [ATR] vowels in Twi and [TENSE] vowels in Ghanaian English Louise McKeever, Joanne Cleland & Jonathan Delafield-Butt
06
Speech motor control and fine motor control in children with autism: An Investigation using ultrasound tongue imaging (speech) and optical motion tracking (fine motor) Stefanos Tserkezis, Jan Ries & Aude Noiray
07
Lingual coarticulation in German typical versus reading disordered children Fabienne Westerberg, Eleanor Lawson, James Scobbie & Stephen Cowen
08
Is UTI suitable for fieldwork? Practical observations from six weeks of recording in Sweden Fabienne Westerberg, Jane Stuart-Smith, Eleanor Lawson, James Scobbie & Stephen Cowen
09
Variation in tongue gesture and acoustics for the Swedish /iː/ vowel Demo Room: S13 | D1 (14:30 – 15:10), D2 (15:20 – 16:00) D1
Steven M. Lulich Acquisition and Analysis of 3D/4D Ultrasound Recordings of Speech
D2
Alan Wrench Real-time tongue contour fitting and vocal tract carving
Room: H02 | D3 (14:30 – 15:10), D4 (15:20 – 16:00) D3
James M. Scobbie & Joanne Cleland Dorsal Crescents: Area and radius-based mid-sagittal measurements of comparative velarity
D4
Alessandro Vietti, Alessia Pini, Lorenzo Spreafico & Simone Vantini Towards cross-subject comparison of tongue profiles A Functional Data Analysis approach
10
Program & Rooms at a Glance Wednesday – October 4
Rooms
08:30 – 09:30
Registration
F1 H02 H02
11:00 – 11:30
Coffee Break
F1
11:30 – 13:00
Oral Session 1
H02
13:00 – 14:30
Lunch
Mensa
Oral Session 2
H02
Coffee Break
F1
16:00 – 17:30
Poster & Demo Session 1
H02, S12, S13
17:30 – 18:30
Oral Session 3
H02
18:30
Welcome Drink
F1
Thursday – October 5
Rooms
10:00 – 11:00
Oral Session 4
H02
11:00– 11:30
Coffee Break
F2
11:30 – 13:00
Oral Session 5
H02
13:00 – 14:30
Lunch
Mensa
14:30 – 15:30
Oral Session 6
H02
15:50
Bus Tour & Dinner
F1
Friday – October 6
Fr
14:30 – 15:30 15:30 – 16:00
Th
Opening Introductions Keynote
We
09:30 – 10:00 10:00 – 11:00
Rooms
10:00 –11:00
Keynote
H02
11:00 – 11:30
Coffee Break
F2
11:30 – 13:00
Oral Session 7
H02
13:00 – 14:30
Lunch
Mensa
14:30 – 16:00
Poster & Demo Session 2
F2, H02, S13
16:00 – 16:30
Coffee Break
F2
16:30 – 17:30
Oral Session 8
H02
17:30
General Discussion & Closing Remarks
H02
11
Building Plan
House 6 Ground Floor
F2
Entrance Train Station
F1 H02 Mensa
12
Building Plan
House 6 1st Floor
S13
S12
13
KEYNOTES & TALKSm
14
Keynote 1
Wednesday – October 4, 2017
Cross-linguistic differences in coarticulation and their possible implications for phonotactics Marianne Pouplier Institute of Phonetics, LMU, Germany
[email protected]
It has repeatedly been suggested that the complexity of the consonantal phonotactic inventory of a language is related to the consonantal coarticulation pattern: typologically unusual consonant clusters have been claimed to exhibit a preference for low-overlap coarticulation (Wright 1996, Chitoran 2016, Pouplier et al. 2017). Possibly this also includes a reduced flexibility to prosodic variation (Wright 1996). In this talk, we present several experiments which put these hypotheses to test. Data from Russian and Slovak challenge the hypothesis that consonant combinations which are presumably auditorily difficult to recover or syllables containing syllabic consonants are resistant to prosodic variation. We further present a cross-linguistic comparison of coarticulatory patterns from seven languages. The extent to which segmentally identical consonant clusters can truly differ in their overlap pattern between languages is not clear since direct cross-linguistic comparisons have rarely been done. First results suggest that languages differ in the range of coarticulatory patterns: While all languages included in our sample have relatively high overlap for some of the clusters, only some of the languages show additionally low overlap patterns. Thus certain clusters seem to favor high-overlap patterns cross-linguistically while others allow for a relatively high degree of cross-linguistic variation. References Chitoran, I. (2016). Relating the sonortiy hierarchy to articulatory timing patterns. A cross-linguistic perspective. In M. J. Ball & N. Müller (Eds.), Challenging Sonority: Cross-linguistic Evidence (pp. 45-62): Equinox. Pouplier, M., Marin, S., Hoole, P., & Kochetov, A. (2017). Speech rate effects in Russian onset clusters are modulated by frequency, but not auditory cue robustness. Journal of Phonetics, 64, 108-126. Wright, R. A. (1996). Consonant clusters and cue preservation in Tsou. PhD Dissertation UCLA.
15
Keynote 2
Friday – October 6, 2017
Dynamical Principles of Hydrostat Movement: Worm, Octopus, Tongue Khalil Iskarous University of Southern California, USA
[email protected]
The tongue’s ability to assume a large number of minimally different shapes, and ability to shift from one shape to another in a few tens of milliseconds, allows for the existence of many thousands, probably hundreds of thousands, of contrastive systems in the dialects of the world’s languages. How is this biological organ, operating within the the larger hydrostatic system of muscles of speech, able to accomplish the evolving social cultural marvel of phonological diversity? The work of Kier and Smith (1985) on the principles of muscular hydrostatic movement, and Stone’s (1990, 1991) work on the hydrostatic nature of tongue movement in speech, are a major step towards understanding how the tongue, through its semi-orthogonal systems of muscles, is able to accomplish a large diversity of configurations by levarging small movements in one dimension to accomplish movement in others. The work to be reported here reports on recent empirical work on a worm hydrostat, Caenorhabditis elegans, and the octopus arm, to understand, in greater detail, the dynamical principles of movement of hydrostats, which allows, simultaneously, for extraordinary flexibility and ease of control.
16
Oral Session 1
Wednesday – October 4, 2017
Investigating the production of Mandarin rhotics using ultrasound imaging Shuwen Chen1, Peggy Pik Ki Mok1, Mark Tiede2, Wei-rong Chen2, D. H. Whalen2,3,4 1
The Chinese University of Hong Kong, Hong Kong 2 Haskins Laboratories, USA 3 City University of New York, USA 4 Yale University, USA
We
[email protected],
[email protected],
[email protected],
[email protected],
[email protected]
Background: The production of the North American English rhotic /ɹ/ has been extensively studied because it can be produced with either a bunched or retroflex gesture while having a very similar acoustic signature (Delattre & Freeman, 1968; Zhou et al., 2008). Mandarin Chinese has a rhotic /ɹ/ that resembles English /ɹ/. It has been traditionally described as being articulated with the tongue tip curling up, and identified as “retroflex consonants” in the literature (Chao, 1968). Later studies using X-ray and EMA, however, reported no retroflexion in the production of Mandarin prevocalic “retroflex consonants”, and the constriction was reported to be around the post-alveolar region (Ladefoged & Maddieson, 1996; Lee & Zee, 2003). Different from English /ɹ/, Mandarin does not allow /ɹ/ in the coda position underlyingly. Mandarin /ɹ/ occurs in the postvocalic position when a sound undergoes r-suffixation, which is a characteristic of Mandarin spoken in Northern China. Given the acoustic similarities between English and Mandarin /ɹ/, it is interesting to examine how Mandarin /ɹ/ is articulated, especially in postvocalic and syllabic positions, and whether phonological differences between prevocalic and postvocalic /ɹ/ would lead to articulatory differences. Methods: We have collected data from ten native Mandarin speakers (five men and five women) who grew up in Northern China. The data were collected with a Siemens ACUSON X300 ultrasound system at a frame rate of 36Hz. During recording, the participants’ heads were unconstrained. The tongue surface
contours were corrected to head-based coordinates using fifteen blue dots on participants’ head and on the ultrasound probe (Figure 1). Participants’ lips were painted blue to enhance contrast for tracking their movement. The participants produced prevocalic /ɹ/ with /ɿ a ɤ u/ vowels, postvocalic /ɹ/ with /i ɿ ʅ y u a ɤ/ vowels, and syllabic /ɚ/. The other three Mandarin “retroflex consonants” /tʂ/ /tʂh/ and /ʂ/ were also recorded for comparison with prevocalic /ɹ/. We also compared syllabic /ɹ/ with diminutive suffix /ɹ/ in the same segmental contexts (For example: /y.̩r/ and /yr/).
Figure 1: The position of tracking dots.
Results and discussion: Our preliminary data confirmed that the articulatory gesture of Mandarin prevocalic /ɹ/ did not show retroflexion, and it did not resemble English bunched /ɹ/ either. Mandarin prevocalic /ɹ/ sounds were articulated in the same way as the other three “retroflex consonants” in Mandarin (/tʂ/, /tʂh/and /ʂ/). It was a post-alveolar
Figure 2: Tongue shapes of /ɹ/, /tʂ/, /tʂh/and /ʂ/ by Participant W1
17
Oral Session 1
Wednesday – October 4, 2017
Retroflex tongue shape when producing /yr/ by Participant M1
Bunch tongue shape when producing /yr/ by Participant M3
Figure 3: Retroflex and bunched /ɹ/ in Mandarin
fricative instead of a “retroflex consonant”. Mandarin postvocalic /ɹ/ was articulated differently compared with prevocalic /ɹ/. Similar to English /ɹ/, it was produced with either a retroflex or bunched tongue shape. However, observed retroflex gestures were more common in Mandarin than reported for English. Syllabic /ɹ/ was also produced with either retroflex or bunched gestures. Different from English /ɹ/, the production of Mandarin postvocalic /ɹ/ did not involve lip protrusion. Figure 4 showed that the degree of lip protrusion in /u/ was larger than that in vowel /a/, syllabic /ɹ/ and postvocalic and prevocalic /ɹ/.
Figure 4: Lip protrusion in syllabic /ɚ/, vowel /a/, pre- and postvocalic /ɹ/, and /u/ by Participant W1.
References Chao, Y.-R. (1968). A grammar of spoken Chinese. Berkeley: University of California Press. Delattre, P., & Freeman, D. C. (1968). A dialect study of American r’s by x-ray motion picture. Linguistics, 6(44), 29–68. Ladefoged, P., & Maddieson, I. (1996). The sound of the world’s languages. Oxford: Blackwell Publishing.
18
Lee, W. S., & Zee, E. (2003). Standard Chinese (Beijing). Journal of the International Phonetic Association, 33(1), 109-112. Zhou, X., Espy-Wilson, C. Y., Boyce, S., Tiede, M., Holland, C., & Choe, A. (2008). A magnetic resonance imaging-based articulatory and acoustic study of “retroflex” and “bunched” American English /r/. The Journal of the Acoustical Society of America, 123(6), 4466–4481. Keywords: speech production, rhotics, Mandarin
Oral Session 1
Wednesday – October 4, 2017
Midsagittal correlates of tongue lateralization Combined EMA and ultrasound evidence Patrycja Strycharczuk1, Donald Derrick2, Jason Shaw3 2
We
1 University of Manchester, UK University of Canterbury, New Zealand 3 Yale University, USA
[email protected],
[email protected],
[email protected]
Background: Much of articulatory research to date has focused on tongue movement in the midsagittal plane. In comparison, less is known about parasagittal tongue movement, or how such movement may affect aspects of articulation in the midsagittal view. In a recent EMA study of /l/ in Australian English, Ying et al. (2016) found that the lateral channel formation varies in timing, depending on the vocalic context and the syllable position. We replicate that finding in this current paper which focuses on lateral production in New Zealand English (NZE). Using synchronized EMA and ultrasound data, we show that lateralization takes longer to form in canonical codas compared to other contexts, and in front vowels compared to central or back vowels. Both of these suggest a close link between dorsal retraction and lateralization, which is further confirmed by ultrasound data. Methods: Seven native speakers of NZE were recorded reading test items including /l/ preceded by the FLEECE, KIT (which is mid and central in NZE) or THOUGHT vowel. The morpho-syntactic boundary following /l/ was systematically varied, following the design by Sproat and Fujimura (1993). Each speaker read 10 repetitions of 15 items, which yielded 150 tokens per speaker. Simultaneous EMA, ultrasound and audio data were captured using the co-registration set-up described in Derrick et al. (2015). Tongue movements were recorded based on data from five EMA sensors placed on the tongue surface: three midsagittal sensors (TT, TB and TD), and two parasagittal sensors (TR and TL). In addition, tongue movement
in the midsagittal plane was captured using GE-Logiq E (version 11) ultrasound at 60fps. EMA and ultrasound signals were synchronized based on acoustic landmarks in the simultaneously recorded and independently synchronized acoustic signal. Following Ying et al. (2016), we calculated the Tongue Lateralization Index for the entire /Vl/ duration, quantified as the difference in height between the relatively more lowered parasagittal sensor (TR.z or TL.z), and the estimated height of the tongue blade in the midsagittal plane along the same horizontal location as TR and TL. Results and discussion: he lateralization index typically rises early on in the vowel, reaching a peak at ca. 35% of the /Vl/ duration. We analyzed the time interval required to reach this peak value in a mixed-effects linear regression model with random intercepts for speaker and block, and fixed effects of the preceding vowel, the morpho-syntactic condition, and the /Vl/ duration. The lateralization interval was longer when the preceding vowel was FLEECE, compared to KIT, or THOUGHT. On average, lateralization took longer to achieve in the canonical coda position (word-final pre-C), than in canonical onsets (word-initial or word-medial /l/ in monomorphemes). The lateralization interval was also longer in relatively longer vowels. The findings concerning vowel and morpho-syntactic condition effects suggest a link between dorsal retraction and lateral channel formation. The more retracted the vowel, the shorter it takes to maximally lower the sides of the tongue. Furthermore, coda /l/s, 19
Oral Session 1
Wednesday – October 4, 2017
Figure 1: GAM-smoothed midsagittal tongue contours for speaker S2. Tongue tip is on the right.
which are typically characterized by increased tongue dorsum retraction compared to onsets, also show increased lateralization intervals. The ultrasound data allow us to explore the relationship between lateralization and retraction in more detail, since ultrasound provides more information on tongue dorsum and tongue root. We compared midsagittal ultrasonic frames corresponding to maximum dorsal retraction, maximum lateralization, and maximum tongue tip raising. As exemplified in Figure 1 for speaker S2, tongue position at lateralization maximum was very closely linked to that at maximal retraction, albeit not perfectly aligned: the tongue root and body are relatively more advanced at the lateralization maximum. This allows us, to an extent, to model parasagittal movement based on midsagittal tongue posture. Such estimations, however, are complicated by lateral asymmetry that is also ubiquitous in our data, and that shows more idiosyncratic behavior, depending on speaker and phonological context. References Derrick, D., Best, C., & Fiasson, R. (2015). Non-metallic ultrasound probe holder for co-collection and co-registration with EMA. In Proceedings of the 18th ICPhS. Sproat, R., & Fujimura, O. (1993). Allophonic variation in English/l/and its implications for phonetic implementation. Journal of Phonetics, 21, 291-311. Ying, J., Shaw, J., Best, C., Proctor, M., Derrick, D. & Carignan, C. (2016). Articulation and Rep-
20
resentation of Laterals in Australian-accented English. Poster presented at LabPhon 15, Cornell University. New York: Springer. Keywords: speech production, co-registration, lateralization
Oral Session 1
Wednesday- October 3, 2017
Morphological effects on articulation Petroula Mousikou1,2,3, Patrycja Strycharczuk4, Alice Turk5, & James M. Scobbie1 1 CASL, Queen Margaret University, UK REaD, Max Planck Institute for Human Development, Germany 3 Dept. of Psychology, Royal Holloway University of London, UK 4 Linguistics and English Language, The University of Manchester, UK 5 Dept. of Linguistics and English Language, The University of Edinburgh, UK 2
We
[email protected],
[email protected],
[email protected],
[email protected]
Background: Converging empirical evidence suggests that the morphological structure of a word influences its articulatory implementation. For example, acoustic durational differences between morphemic and non-morphemic segments (i.e., past tense /d/ and /t/ in words like “rapped” and /d/ and /t/ in words like “rapt”) have been observed in English (Losiewicz, 1992). Similarly, inter-gestural timing for bimorphemic words has been shown to be more variable than for apparently homophonous monomorphemic words in Korean (Cho, 2002). Such findings are inconsistent with traditional theories of speech production, which posit that articulatory levels of processing do not have access to morphological information (e.g., Chomsky & Halle, 1968; Levelt et al., 1999), but can be explained by theories that postulate that the morphological structure of a word may be encoded in the phonetic realization (see Cho, 2002). However, a serious limitation of previous studies is that they typically fail to control for important lexical and orthographic variables that are known to affect articulation systematically (Hanique & Ernestus, 2012). The present study overcomes previous methodological shortcomings by using nonce words within a word-learning paradigm to determine whether there are morphological effects on articulation. Methods: Ten monolingual native speakers of Scottish English, aged 18-33 years (M = 22.2, SD = 4.3, 3 males), with no reported visual, hearing, reading, or speech impairments participated in the study. The experi-
mental stimuli consisted of 16 nonce words: 8 nouns (dard, gord, lerd, mord, sard, tord, vard, zord) and 8 verbs (to dar, to gor, to ler, to mor, to sar, to tor, to var, to zor), which were associated with 8 unfamiliar objects and 8 unfamiliar definitions of actions, respectively. Participants were trained aurally on these associations over two days, achieving 100% accuracy by the end of training. On the third consecutive day, they were tested on the newly learnt words. Their task involved completing orally the phrases/sentences “A __. It’s a __. It’s a __ again” with the name of the visually presented object (e.g., zord), and the phrases/sentences “To __. Yesterday, Tessa __.Yesterday, Tessa __ again” with the appropriate form of the verb that corresponded to the visually presented definition (e.g., to zor and zorred, respectively). Each item was presented six times, resulting in a total of 96 productions per speaker. We captured the tongue movement with a high-speed ultrasound system (121 fps), and also recorded time-aligned audio signal. Results and discussion: The acoustic analysis involved measuring whole-item duration in the two conditions (nouns vs. verbs) and contexts (sentence-final vs. pre-vocalic). The results revealed no significant durational differences between the two conditions in neither context, thus indicating no effect of morphological structure on acoustic duration. In the articulatory analysis, we compared the tongue shapes in the two morphological conditions and in both prosodic contexts for each speaker separately. For each item, the
21
Oral Session 1
Wednesday – October 3, 2017
Figure 1: Mean tongue contour at maximum TT raising
tongue contour was manually traced throughout the production of the vowel and the coda. SS-ANOVA (Davidson 2006) comparison of tongue contours was then performed at the point of maximum tongue tip (TT) raising during final stop production. We hypothesized that progressive coarticulation from the /r/ might be reflected as a bunching gesture, so that tautomorphemic /d/ in nouns may be more /r/-like in shape than the suffix /d/ in verbs. Our analysis revealed an inconsistent pattern of results across speakers. Example data from two speakers are shown in Figure 1. For Speaker A (top panels), the tongue dorsum was more retracted (thus, more /r/-like) for nouns than for verbs. However, for Speaker B (bottom panels), the tongue shapes in the two conditions and contexts were identical. We interpret our results in the context of previous findings in this research domain (e.g., Sugahara & Turk, 2009; Song et al., 2013) and consider their implications for extant theories of speech production. References Cho, T. (2002). Effects of morpheme boundaries on intergestural timing: Evidence from Korean, Phonetica, 58, 129–162. Chomsky, N., & Halle, M. (1968). The Sound Pattern of English. New York: Harper & Row. Davidson, L. (2006). Comparing tongue shapes from ultrasound imaging using smoothing/
22
spline analysis of variance. The Journal of the Acoustical Society of America, 120, 407–415. Hanique, I., & Ernestus, M. (2012). The role of morphology in acoustic reduction. Lingue e linguaggio, 11, 147–164. Levelt, W.J.M., Roelofs, A., & Meyer, A.S. (1999). A theory of lexical access in speech production. Behavioral and Brain Sciences, 22, 1–75. Losiewicz, B.L. (1992). The effect of frequency on linguistic morphology. PhD dissertation, University of Texas. Song, J.Y., Demuth, K., Shattuck-Hufnagel, S., & Ménard, L. (2013). The effects of coarticulation and morphological complexity on the production of English coda clusters: Acoustic and articulatory evidence from 2-year-olds and adults using ultrasound. Journal of Phonetics, 41, 281–295. Sugahara, M., & Turk, A. (2009). Durational correlates of sub-lexical constituent structure in English. Phonology, 26, 477–524. Keywords: speech production, morphology, articulatory-acoustic relations
Oral Session 2
Wednesday – October 4, 2017
C-V Coarticulation in Consonants with Multiple Lingual Constrictions Amanda L. Miller The Ohio State University, USA
[email protected]
Methods: C-V coarticulation in monosyllabic words containing initial clicks and /i:/ vowels is investigated in Mangetti Dune !Xung with 114 fps lingual ultrasound and acoustic data that is time-aligned to the 9 ms frame, collected using the CHAUSA method (Miller & Finch 2011). Three constriction locations were measured: Tongue Tip / Blade (TT), Tongue Body (TB) and Tongue Root as in Articulatory Phonology (Browman and Goldstein 1990). First and second formant measurements were taken over the first half of the vowel, by which time the /i/ vowels following all 4 clicks had reached their peak F2 values. The data collection method and the alignment of the ultrasound and acoustic data are described in Miller and Finch (2011), and the data analysis methodology is described in Miller (2016). Four speakers participated in the study, 2 male speakers - Rooi Jenggu Fransisko (JF) and Martin ||Oce Aromo (MA), and 2 female speakers - Sabine Towe Riem (SR) and Caroline Thumbo Kaleyi (TK).1 Participants were all in their early twenties at the time of recording. Results and discussion: Vowels following clicks have three lingual gestures involving the tongue tip/blade (TT), tongue body (TB), and tongue root (TR). TT and TB constrictions carried over from the clicks merge into a single vowel constriction at consonant specific rates (see Figure 1). Second formant (F2) values distinguish each word type through the
vowel midpoint (see Figure 2). In regression analyses, the TB Constriction Location and the TR Constriction Location best predict F2 for alveolar click initial words, while the TT Constriction Location best predicts F2 values for the dental and palatal click initial words. The closest vowel constriction location best predicts F2 throughout the vowel. The more open constriction is acoustically inert. I argue that the coarticulation patterns are prosodically controlled. Dental and palatal clicks have TB and TR gestures associated with the syllable onset, while TB and TR gestures are aligned to the right edge of the first mora following the alveolar click.
Figure 1: Plots of the TT, TB and TR Constriction Locations over the first half of the /i/ vowel in all 4 click initial words
Figure 2: First and Second Formant values throughout the /i/ vowel following the four contrastive click types (red lines show point at which articulatory plots end)
23
We
Background: C-V coarticulation has previously been investigated for click consonants by Thomas-Vilakati (2010), who showed that each of the three clicks found in Zulu displays its own manner of production, and that the production of clicks differs in different vowel contexts.
Wednesday – October 4, 2017
References Browman, C. and Goldstein, L. (1990). Tiers in articulatory phonology, with some implications for casual speech. In John Kingston & Mary. E. Beckman (eds.), Papers in laboratory phonology I: Between the grammar and the physics of speech. Cambridge: Cambridge University Press. Miller, A. (2016). Posterior lingual gestures and tongue shape in Mangetti Dune !Xung clicks. Journal of Phonetics 55, 119-148. Miller, A. and Finch, K. (2011). Corrected High frame rate Anchored Ultrasound with Software Alignment. Journal of Speech, Language and Hearing Research , April 2011, 471-486. Thomas-Vialakti, Kimberley D. (2010). Coproduction and Coarticulation in IsiZulu Clicks. University of California Publications in Linguistics. Volume 144. Berkeley and LosAngeles, CA. University of California Press. Keywords: articulatory-acoustic relations, C-V coarticulation, click consonants
24
Oral Session 2
Oral Session 2
Wednesday – October 4, 2017
Articulatory Constraints and Sensitivity to Coarticulation of Arabic Pharyngealization Sawsan Alwabari Department of Linguistics, University of Ottawa, Canada
[email protected]
Research questions: The questions are, thus, if the dorsum is retracted in producing pharyngealization (either as a secondary articulator or as a result of mechanical dependency with the tongue root), does dorsal raising in velars and palatals impede coarticulatory pharyngealization? Does the articulatory conflict make these consonants less susceptible to coarticulation than labials? Methods: The wordlist consists of 44 words containing the sequence C1aC2, where C1 is either labial /b f/, velar /ɡ x/ or palatal /j ʃ/ and C2 is either pharyngealized /tˤ sˤ/ or plain /t s/. The intervening vowel is restricted to /a aː/ since it is the most sensitive vowel to pharyngealization. The wordlist is controlled
for stress, C1-C2 syllable position, syllabic distance between C1 and C2 and the exclusion of other guttural consonants. Examples of (near)-minimal pairs of the three C1 places of articulation are: • Labial: /baːshaː/ ‘kissed her’, /baːsˤhaː/ ‘her bus’. • Velar: /ɡaːshaː/ ‘measured her’, /ɡaːsˤhaː/ ‘cut her’. • Palatal: /ʃatarhaː/ ‘cut her’, /ʃatˤarhaː/ ‘cut her’. Ultrasound data is collected from 11 native Arabian Peninsula speakers using Terason T3000 operating Ultraspeech 1.2 that generates 320×240px images of the mid-sagittal plane at 60 fps. Ultrasound Stabilization Headset was used to immobilize the transducer. Audio is recorded using two Shure Beta 53 microphones, connected to Ultraspeech or external PC using Shure X2u preamp. Occlusal plane images are generated by placing a tongue depressor against the teeth and tongue. Each participant read 44 words embedded in a carrier sentence and repeated 5 times. The sample size is determined according to power analysis on preliminary results. After synchronizing ultrasound images with audio, specific frames of interests from C1 and C2 are selected at 50% of stop closure or fricative turbulence, at stop release, and at 25 and 75% of fricative. Tongue surfaces are traced using Palatoglossatron 1.0 and extracted in pixel. Results and discussion: SS-ANOVA is conducted on tongue contours using polar coordinates (Mielke, 2015) but plotted using Cartesian coordinates. Since this is a work in-progress, results presented here are from 2 speakers at 50% of stop closure and fricative 25
We
Background: Articulatory requirements of a segment can restrict or enhance its sensitivity to contextual variation such as coarticulation. The degree of articulatory constraints (DAC) model (Recasens, 2014) proposed that the magnitude and temporal extent of coarticulation at a specific region of the vocal tract can be attributed to the active participation of that region in producing the affected segment itself. The purpose of this study is to present a novel articulatory account of coarticulation that has never been addressed as a C-to-C process and examine if DAC principles apply to coarticulatory pharyngealization in Arabic. This language maintains a phonemic contrast between plain /t s/ and pharyngealized /tˤ sˤ/ whose coarticulatory effect can extend to word boundaries (Davis, 1995). The majority of researchers agreed that pharyngealization involves retraction of the tongue root toward the pharyngeal wall (Laufer & Baer, 1988) whereas others argued that both the tongue root and dorsum are retracted (Zawaydeh & Jong, 2011).
Oral Session 2
Wednesday – October 4, 2017
a) Labial C1 & plain/phar C2
b) Palatal C1 & plain/phar C2
c) Velar C1 & plain/phar C2
d) Three C1 places before phar C2
Figure 1: Tongue trajectories and 95% CI for plain & pharyngealized C2, and the three categories of C1.
(Generalized Additive Model results from all 11 speakers to come soon). Results include: • Unlike plain C2 /t s/, pharyngealized C2 /tˤ sˤ/ require retraction of tongue root and not the dorsum (Fig. 1a-c). • Labials /b f/ exhibit a coarticulatory effect smaller than predicted (Fig 1a). • Consistent with DAC, palatals /j ʃ/ and velars /ɡ x/ exhibit a root advancement in both plain and pharyngealized contexts, suggesting that dorsal raising involved in their production restricts coarticulation from Cˤ2 (Fig. 1b-c). • In line with DAC predictions, labials demonstrate greater sensitivity to pharyngealization (as shown in tongue root retraction) than palatals and velars (Fig. 1d). References Davis, S. (1995). Emphasis spread in Arabic and grounded phonology. Linguistic Inquiry, 26 (3), 465-498. Laufer, A., & Baer, T. (1988). The emphatic and pharyngeal sounds in Hebrew and in Arabic. Language and Speech, 31(2). Mielke, J. (2015). An ultrasound study of Canadian French rhotic vowels with polar smoothing spline comparisons. JASA, 137. Recasens, D. (2014). Coarticulation and sound change in Romance. Amsterdam: John Benjamins Publishing Company. Zawaydeh, B. A & Jong, K. (2011). The phonetics of localising uvularization in Ammani-Jordanian Arabic: An acoustic study. In Z. Hassan, B. Heselwood (eds.) Instrumental studies in Arabic phonetics. Amsterdam: John Benjamins
26
Publishing. Keywords: speech production, coarticulation, articulatory constraints.
Oral Session 3
Wednesday – October 4, 2017
Using ultrasound to investigate articulatory variation in the GOOSE vowel in the British Isles. Eleanor Lawson1, Jane Stuart-Smith2, Lydia Mills1 1
CASL laboratory, Queen Margaret University, Edinburgh, UK 2 GULP laboratory, University of Glasgow, UK
Background: It might seem that little is to be gained from articulatory analysis of vocalic accent variation, where recording is generally more time-consuming and invasive and where smaller numbers of speakers and tokens are generally obtained. However, acoustic-only studies of vowels risk conflating variants that might be, performatively, very different. For example, Harrington et al., 2011 pointed out that fronted GOOSE in Standard Southern British English (SSBE) could result from tongue-body fronting, lip unrounding, or a combination of both. They found that tonguebody fronting was responsible for F2-raising in SSBE. Ferragne and Pellegrino’s (2010) acoustic survey of vowel systems of British Isles Englishes identified normalised F2 values similar to those of SSBE in different varieties of British Isles Englishes, including Glaswegian English (Ferragne and Pellegrino, 2010), but fronted GOOSE in Central Scottish English (CSE) is a much older sound change than in SSBE (Johnston, 1997) and traditional descriptions of GOOSE in CSE mention that it has a more endolabial style of lip rounding than “standard” (Anglo-English) speech (see McAllister, 1938). Our study presents a method for measurement of the articulatory parameters of the GOOSE vowel using an audio-UTI and lip video corpus of British Isles Englishes, to determine whether articulatory analysis provides a finer-grained view of vocalic variation than acoustic analysis. Methods: We used an audio-UTI and lip video corpus (2012-14), comprising of 10 male and 10 female speakers aged 20-35 from England (8 speakers), Republic of Ireland (3), Northern Ireland (1) and Scotland (8).
We
[email protected],
[email protected]
We adapted tongue body measurement and normalization techniques from (Scobbie et al., 2012), measuring GOOSE relative to the FLEECE anchor vowel and normalizing using corner vowels FLEECE and TRAP and /w/ as the high-back corner “vowel”. Single UTI frames were annotated, located temporally at the midpoint of monophthongal GOOSE vowels, or the middle of the second element of diphthongal variants. Splines were fitted to the tongue surface using AAA and exported as Cartesian coordinates for automatic measurement of height and backness using R. Lip protrusion was measured from profile lip video and normalised. For comparison, single-point acoustic measures were also taken at the same time point as the tongue-body measures and normalised. Normalised tongue body and acoustic measures were plotted on scatter plots for comparison and correlation tests were carried out for acoustic and articulatory measures. Results: In the articulatory and acoustic datasets, two GOOSE-vowel groups emerged: (Group 1) England and the Irish Republic, (Group 2) Scotland and N. Ireland. Articulatory and acoustic plots both showed height differences between groups 1 and 2. However, while acoustic plots showed overlap in the F2 dimension between regional speaker groups, articulatory plots showed a clear differentiation in the horizontal dimension (see Fig. 1), with group two speakers having a backer tongue body position than group 1 speakers. Differences also emerged between groups 1 and 2 regarding lip protrusion, with group 1 showing much more protrusion than group 2.
27
Wednesday – October 4, 2017
Oral Session 3
(2012). Back to front: a socially-stratified ultrasound tongue imaging study of Scottish English /u/. Rivista di Linguistica / Italian Journal of Linguistics, Special Issue: Articulatory techniques for sociophonetic research 1, 103-148. Keywords: dialectology; GOOSE-fronting
Figure 1: Scatterplot of normalised tongue body height and frontness for GOOSE.
So similar acoustic values were achieved by having a fronter tongue body and more protruded lips (Group1), or a backer tongue body and less protruded lips (Group 2). Spearman’s correlation tests with Bonferroni corrections showed strong correlations for articulatory and acoustic height and a strong negative correlation for acoustic frontness and lip protrusion, but there was no correlation for articulatory and acoustic frontness. Our study highlights the deficiencies of a purely acoustic approach to diatopic vocalic variation. References Ferragne, E. and Pellegrino, F. (2010). Formant frequencies of vowels in 13 accents of British English. Journal of the International Phonetic Association 1, 1-34. Johnston, P. A. (1997). Older Scots Phonology and Its Regional Variation. In The Edinburgh History of the Scots Language., edited by Jones, C. (Edinburgh University Press, Edinburgh), pp. 47-111. McAllister, A. (1938). A Year’s Course in Speech Training. (University of London Press Ltd., London). Scobbie, J. M., Stuart-Smith, J. and Lawson, E.
28
Oral Session 3
Wednesday – October 4, 2017
Vowel duration and tongue root advancement in Italian and Polish Stefano Coretta University of Manchester, UK
[email protected]
Methods: Two speakers of Italian (2 males) and two speakers of Polish (1 female, 1 male) were recorded using a TELEMED Echo Blaster 128 unit with a TELEMED C3.5/20/128Z-3 ultrasonic transducer, and a Movo LV4-O2 Lavalier Microphone connected to an Articulate Instruments system. The transducer was placed under the submental triangle, perpendicular to the mid-sagittal plane, and stabilised with an Articulate Instruments headset. Ultrasound and audio data were acquired and synchronised with Articulate Assistant Advanced (AAA, v. 2.17.1, Articulate Instruments Ltd. 2011). The target words were of the form C1V1C2V1, where C1 = /p/, V1 = /a, o, u/, C2 = /t, d, k, g/. The words were embedded in prosodically similar sentences
(Italian Dico X lentamente ‘I say X slowly’, and Polish Mówię X teraz ‘I say X now’). Automatic tongue contour tracking was performed in AAA, with subsequent manual correction. Tongue contours were extracted in Cartesian coordinates at two time points: (1) at the time of C2 closure, as identified from the acoustic signal, and (2) at the time of maximum constriction, which corresponds to the time of the local minimum in the instantaneous velocity profile of the tongue tip/dorsum (Strycharczuk & Scobbie 2015). The contours were normalised by applying offsetting and rotation relative to the participant’s occlusal plane (Scobbie et al. 2011). Results: Since the performance of spline tracking for /u/ was poor, data from this vowel was not included in the analysis. According to generalised additive mixed effects models (GAMM), tongue root at maximum constriction was significantly more anterior in voiced consonants in Italian, but not in Polish (Figure 1). Moreover, advancement was also found at consonant closure in Italian. Finally, a separate GAMM confirmed that the tongue root in the voiced consonants of Italian was less advanced at closure than at maximum constriction. Discussion: The fact that root advancement in Italian was found at consonant closure indicates that advancement is initiated during the articulation of the vowel, before closure is achieved. This finding is compatible with the idea that vowels are longer before voiced consonants because more time is needed for the tongue to adjust compared to voiceless stops (see Halle & Stevens 1967, who discuss an articulatory explanation along the same lines, although relating it to laryngeal, rather than lingual configurations). The absence of a
29
We
Background: Previous research shows that tongue root position in obstruents is conditioned by voicing, such that voiced consonants show advancement of the tongue root (Westbury 1983). Ahn (2015) further argues that the advancement contributes to shorter VOT in stops. In this paper, I show that root advancement is also related to another correlate of voicing, i.e. the duration of the preceding vowel. In several languages including Italian, vowels followed by voiced stops are longer that those followed by voiceless stops (Farnetani & Kori 1986). Polish, on the other hand, lacks this effect (Keating 1984). The data collected for this study shows that voiced stops in Italian are produced with root advancement. In contrast, voiceless and voiced stops in Polish had the same root position. I propose an articulatory account in which the presence of root advancement in the voiced stops of Italian is responsible for the lengthening of the preceding vowel, due to the time required for the tongue to reach an advanced position.
Wednesday – October 4, 2017
difference in vowel duration in Polish could then be attributed to the absence of root adjustments, as shown by the results of the present study. References Ahn, S. 2015. The role of the tongue root in phonation of American English stops. Paper presented at Ultrafest VII. Articulate Instruments Ltd. (2011) Articulate Assistant Advanced user guide. Version 2.16. Farnetani, E. & Kori, S. (1986). Effects of syllable and word structure on segmental durations in spoken Italian. Speech communication, 5(1), 17–34. Halle, M. & Stevens, K. (1967). Mechanism of glottal vibration for vowels and consonants. The Journal of the Acoustical Society of America, 41(6), 1613–1613.
Oral Session 3
Keating, P. A. (1984). Universal phonetics and the organization of grammars. UCLA Working Papers in Phonetics, 59. Scobbie, J. M., Lawson, E., Cowen, S., Cleland, J & Wrench, A. A. (2011). A common co-ordinate system for mid-sagittal articulatory measurement. QMU CASL Working Papers WP-20. Strycharczuk, P. & Scobbie J. M. (2015). Velocity measures in ultrasound data. Gestural timing of post-vocalic /l/ in English. Proceedings of the 18th International Congress of Phonetic Sciences. Westbury, J. R. (1983). Enlargement of the supraglottal cavity and its relation to stop consonant voicing. The Journal of the Acoustical Society of America, 73(4), 1322–1336. Keywords: speech production, vowel duration, tongue root
Figure 1: Tongue contour in voiceless and voiced stops at maximum constriction in Italian (left) and Polish (right). When the confidence intervals of the difference smooth (grey) do not cross 0 on the y-axis, they indicate a significant difference (red line on the x-axis).
30
Oral Session 4
Thursday – October 5, 2017
A 3D biomechanical equilibrium model of the tongue Can sparse stepwise control provide fluid movement? Alan Wrench1,2, Peter Balch 1 2
Articulate Instruments Ltd., UK Queen Margaret University, UK
[email protected]
Methods: We recorded a sentence using an EchoB ultrasound system (Articulate Instruments Ltd) at ~54 frames per second. We loaded the ultrasound image sequence into
Th
Background: A project to predict tongue contour shapes from partial information in an ultrasound image by developing a 3D model has grown in scope. We are now investigating whether our model can inform us about the neuromuscular control signals that govern lingual articulation. We have developed software that allows us to manually sculpt muscles and rigid hinged structures in 3D space. We have used our knowledge of the anatomy of the tongue to create a 3D hexahedral mesh and assign muscle fibres within it. These muscle fibres can be grouped and controlled as a group by setting a target length λ defined as a % of rest length. When a λ value is changed, it takes a finite amount of time for the model to settle to a new equilibrium position. The interplay between muscles means that once equilibrium is reached the λ length and the actual length l of each muscle will not be the same. Nevertheless, a set of λ values define a resulting shape and position once equilibrium is reached. Crucially, reaching equilibrium does not happen instantaneously; it takes a significant amount of time of the order of many milliseconds. This approach is distinct from other tongue models, such as Artisynth [Stavness et al, 2012], which control muscles by assigning a contraction force to specify the length l. In such models there is no time component to the calculation. Once a movement sequence is learnt, control parameters (a set of l values) have an instant effect on the position and must change fluidly to produce fluid movement.
Figure 1: Midsagittal 3D model fitted to ultrasound image.
our MyoSim3D software and assigned keyframes to time points where the trajectory of the tongue was observed (subjectively) to change. Using these time points only, we manually assigned muscle target lengths so the model proceeded to change shape to eventually match the tongue contour at the next keyframe. Results and discussion: After fitting the model we end up with a series of 33 vectors of λ values controlling the fluid movement of the tongue for a 3.7 second utterance. Before proceeding with the discussion it should be noted that there are many components to this model: The mesh, the fibre assignments, the volume preservation calculation, the rigid body motion, the modelling of proprioceptive feedback. For our model to fit the data precisely, all of these model components would have to be a faithful representation. This is more than can be expected at this early stage in our research. The model midsagittal contour does not therefore precisely match the ultrasound tongue surface contour at every time point. Despite this, the experiment does show that sparse discrete changes in control parameters can lead to plausible fluid movement of the tongue. The model displays velocity profiles simi-
31
Thursday – October 5, 2017
lar to those observed in EMA studies of real tongue motion. When moving from keyframe to keyframe, the velocity of the tracking points increases then decreases as the target is approached. The larger the step change in λ between keyframes, the bigger the change in the tongue shape and the faster the tracking points move towards the target articulation. It is worth emphasizing that these velocity profiles arise from the instantaneous change in λ values and the resulting transition to equilibrium due to the muscle spindle feedback loop model we have implemented. Speculation: We can imagine that the relative timing and amplitude of these changes in λ constitute a motor program. In the parlance of gestures and synergies, a vector of altered λ values at a given time point might constitute a gesture and the subset of muscles whose λ values change at that time point might be interpreted as a synergy. Furthermore it can be noted that increasing the overall rate may result in gestural undershoot if the model is only part way to its equilibrium point before the next change in λ values occurs. However, note that relative timing is part of the program and a gesture (i.e. a λ vector) does not necessarily equate to a phonological unit. Rather, a sequence of vectors might constitute a motor program for a unit of any size: phone (dipthongs have two gestures), diphone, triphone, syllable, word or phrase. Our model shares some of the concepts and language of Task Dynamics [Salzman and Munhall, 1989] but is based on physiology of neurons, muscles and proprioception rather than damped oscillators with stiffness and damping parameters. Learning and monitoring of these gestures is most likely performed via sensory feedback. References Salzman, E.L. and Munhall, K.G., (1989). A dynamical approach to gestural patterning in speech production. Ecological Psychology, 1(4), 333-382.
32
Oral Session 4
Stavness, I, Lloyd, J.E. and Fels, S., (2012). Automatic prediction of tongue muscle activations using a finite element model. Journal of Biomechanics, 45, 2841-2848. Keywords: biomechanical modeling, speech motor control
Oral Session 4
Thursday – October 5, 2017
Acquisition and Analysis of 3D/4D Ultrasound Recordings of Speech Steven M. Lulich Indiana University Bloomington, USA
[email protected]
the ultrasound system and the computer simultaneously to start and stop recording. The ultrasound system has a short variable delay at the beginning of data acquisition, but the end of the acquisition is synchronous with the end of audio recording on the computer. Third, after data acquisition, ultrasound data must be retrieved from the ultrasound system. Frames acquired with a frame-grabber card simply reflect what is displayed on the ultrasound monitor, and therefore generally include 3D/4D data from three planes (sagittal, coronal, and transverse) rather than fully volumetric data. Fully volumetric data can be exported from the ultrasound system to a DICOM server, a CD ROM, or a USB thumb drive.
Data Acquisition: Three aspects of data acquisition are important for consideration. First, because the cost of 3D/4D systems is still very high (list prices are more than $200,000 USD), access to such systems remains a significant limitation. Second, since 3D/4D systems are not built for research, and audio speech data cannot yet be conveniently synchronized with the ultrasound. A widely adopted solution in speech ultrasound research is to use a computer to grab frame directly from the ultrasound system while simultaneously recording audio. A second option, implemented at Indiana University, is as follows. The Philips EpiQ 7G system includes a USB programmable three-switch foot pedal. The foot pedal was dismantled and a BNC cable was spliced with the output of one of the switches upstream from the USB circuitry. The foot pedal with spliced BNC output was then reassembled, and the BNC cable was connected to a separate computer’s DAQ card. Pressing the spliced switch triggers both
Data Analysis: Proprietary software is available for analyzing 3D/4D ultrasound data, but is limited with regard to speech research purposes. DICOM readers are available, including MATLAB’s “dicomread” function, but these generally do not allow access to the full volumetric data. A free, open-source MATLAB toolbox called WASL has been developed at Indiana University to make full use of volumetric ultrasound data for research purposes. There are four distinct advantages to using 3D/4D ultrasound in speech research. First, with volumetric data it is possible to view any slice through the image, rather than only one particular pre-defined slice (e.g. the midsagittal plane). This increases the flexibility with which data can be analyzed, since researchers can choose specific slices for analysis and quantification based on the properties of the data themselves. Second, images from orthogonal slices carry information that can help to disambiguate tongue surface features. For example, midsag33
Th
Background: Most speech research involving ultrasound makes use of two-dimensional midsagittal images (Stone, 2005; Bressmann, 2010), although parasagittal (Gick et al, 2017) and coronal (Stone et al, 1988) images have also been used. Three-dimensional data can be reconstructed from multiple two-dimensional images (Lundberg and Stone, 1999), and 3D/4D data have been acquired and analyzed (Bressmann, 2010). As ultrasound technology continues to improve, new methods are necessary for acquiring and analyzing 3D/4D data. The goal of this paper is to outline how 3D/4D speech ultrasound data can be effectively acquired and efficiently analyzed, based on 4 years of experience working with a Philips EpiQ 7G 3D/4D xMatrix ultrasound system.
Thursday– October 5, 2017
ittal images frequently contain more than one tongue surface contour, in which one contour represents the true surface of the tongue while the other contour represents the surface of the tongue from a neighboring slice which has “bled over” as an imaging artefact. A coronal slice can be used to unambiguously determine which contour is the artefact. Third, 3D data allow segmentation and visualization of the tongue surface in three dimensions. Three-dimensional visualization of tongue surface deformations opens a direct, intuitive window of observation into tongue biomechanics. Fourth, thanks to left-right symmetry of the tongue, 3D/4D ultrasound allows image registration (at least in part) on the basis of information in the images themselves, without the use of transducer- or head-stabilization devices. Challenges and Promise: In spite of leftright symmetry, stabilization-free image registration still poses challenges, especially in the sagittal plane. The effectiveness of using synchronous lateral webcam video to improve sagittal registration is currently being explored. WASL currently relies on manual tongue segmentation, which is laborious and prone to user error. Accurate (semi)-automatic segmentation (Laporte and Ménard, 2015) of would significantly reduce the acquisition-to-publication pipeline. Volumetric ultrasound does not image the distal wall of the vocal tract well, e.g. the posterior pharyngeal wall and palate. Combining ultrasound with high definition volumetric MRI (e.g. using WASL) can aid interpretation of ultrasound images within a structural anatomic context. References Bressmann, T. (2010). 2D and 3D ultrasound imaging of the tongue in normal and disordered speech. In: Maassen, B & Lieshout, P., editors. Speech motor control: New developments in basic and applied research. Oxford: Oxford
34
Oral Session 4
University Press, pp. 351-370. Gick, B., Allen, B., Roewer-Després, F. & Stavness, I. (2017) Speaking tongues are actively braced. Journal of Speech Language Hearing Research 60, 494-506. Laporte, C. & Ménard, L. (2015). Robust tongue tracking in ultrasound images: a multi-hypothesis approach. INTERSPEECH, pp. 633-637. Lundberg, A. J. & Stone, M. (1999) Three-dimensional tongue surface reconstruction: Practical considerations for ultrasound data. Journal of the Acoustical Society of America 106(5), 2858-2867. Stone, M. (2005). A guide to analyzing tongue motion from ultrasound images. Clinical Linguistics and Phonetics 19(6-7), 455-501. Stone, M., Shawker, T.H., Talbot, T.L. & Rich, A.H. (1988). Cross-sectional tongue shape during the production of vowels. Journal of the Acoustical Society of America 83(4), 1586-1596. Keywords: methodological research, 3D ultrasound, speech motor control.
Oral Session 5
Thursday – October 5, 2017
Quantification of Pivots in Ultrasound Images D. H. Whalen1,2,3, Mark K. Tiede1, Boram Kim2 2
1 Haskins Laboratories, USA City University of New York, USA 3 Yale University, USA
[email protected],
[email protected],
[email protected]
Methods: Sample tongue shapes were collected with the HOCUS image correction system (Whalen et al., 2005). An example is shown in Figure 1, from a speaker of Mandarin. There is clearly a pivot between these two segments, and yet it is also clear that it is not perfect. Indeed, all of Iskarous’s examples had some variability in them, and quantifying that variability should indicate how strong the pivot is. There needs to be significant change in position on both sides of the pivot as well. The pivot will be assessed by comparing distances between the tongue surface traces for points along the majority of each trace. Our goal is to compare as much of the tongue surface as possible, but a consistent issue with ultrasound imaging is that different frames
will capture different sections of the tongue. Thus the beginning of one trace may represent a very different portion of the tongue than the beginning of the next. Traces are made with GetContours (Tiede, Haskins Labs), which computes 100 x-y locations in the speaker’s vocal tract space. To avoid the above-mentioned problem, the first 10 and last 10 points of each trace will be ignored. (This amount can be adjusted upward or downward depending on the characteristics of a particular data set.) The distance between two traces is then calculated for each point within that trace to all other points on the other traces. An average of all the traces is computed, and an average trace (again with 80 points) is created. Then an average cross-trace minimum distance is computed for these 80 points. The nearest point (in x-y space) from each original trace is found, and the smallest distances for that trace to all the others is found. The average distance is then computed for each trace, and the global average is computed. When this is completed, pivots will have small averages, and area of change will have large averages. The 80 points can be displayed in the same way that Iskarous’s (2005) distances were (see his Fig. 5). The magnitude of the average distance at the pivot point will be an indication of the strength of the pivot, as in Iskarous’s original formulation, and pivot vs. arch can be assessed based on the magnitude of the change on either side of the pivot. Results and discussion: Preliminary testing has been promising but incomplete. We will report on specific cases at the conference. We expect that the strength of the pivot will correlate with aspects of a languages phonology. Disordered speech, or speech with a strong 35
Th
Background: Tongue movement in speech is quite complex in some ways yet must be relatively simple to be easily controlled. Iskarous (2005) found evidence that two basic patterns, the “pivot” and “arch,” could explain the majority of transitions between segments. His evidence was based on x-ray data, and ultrasound gives a comparably extensive view of the tongue shape. Although pivots have been tested perceptually (Iskarous, Nam, & Whalen, 2010; Nam, Mooshammer, Iskarous, & Whalen, 2013), they have not been extensively studied articulatorily (but see Kocjančič, 2010; Proctor, 2011). Iskarous (2005) provided a mathematical way of locating the pivot, but it requires reference to the hard structures of the vocal tract. Ultrasound images are often lacking large sections of the hard structures, so any quantification would require a new method. Such quantification would allow for assessment of various aspects of typical and disordered speech.
Thursday – October 5, 2017
Oral Session 5
Keywords: methodological research, speech production, pivots
Figure 1: Overlaid extracted tongue surfaces from Mandarin diphthong [ai]. Anterior is to the left. Early images are in blue, later ones in green.
foreign accent, could well show weak pivots as well. It is hoped that this formulation will be of enough general use that more extensive tests can be performed on issues of syllable affiliation, language acquisition, and motor control. References Iskarous, K. (2005). Patterns of tongue movement. Journal of Phonetics, 33, 363-381. Iskarous, K., Nam, H., & Whalen, D. H. (2010). Perception of articulatory dynamics from acoustic signatures. Journal of the Acoustical Society of America, 127, 3717-3728. Kocjančič, T. (2010). Ultrasound and acoustic analysis of lingual movement in teenagers with childhood apraxia of speech, control adults and typically developing children. (Ph.D. dissertation), Queen Margaret University. Nam, H., Mooshammer, C., Iskarous, K., & Whalen, D. H. (2013). Hearing tongue loops: Perceptual sensitivity to acoustic signatures of articulatory dynamics. Journal of the Acoustical Society of America, 134, 3808-3817. Proctor, M. (2011). Towards a gestural characterization of liquids: Evidence from Spanish and Russian. Laboratory Phonology, 2, 451-485. doi:10.1515/labphon.2011.017 Whalen, D. H., Iskarous, K., Tiede, M. K., Ostry, D. J., Lehnert-LeHouillier, H., Vatikiotis-Bateson, E., & Hailey, D. S. (2005). HOCUS, the Haskins Optically-Corrected Ultrasound System. Journal of Speech, Language, and Hearing Research, 48, 543-553.
36
Oral Session 5
Thursday – October 5, 2017
Towards cross-subject comparison of tongue profiles A Functional Data Analysis approach Alessandro Vietti2, Alessia Pini1, Lorenzo Spreafico2, Simone Vantini1 2
1 MOX - Dept. of Mathematics, Politecnico di Milano, Italy Alpine Laboratory of Phonetic Sciences – ALPS, Free University of Bozen-Bolzano, Italy
[email protected],
[email protected],
[email protected],
[email protected]
specifically for each tongue by means of socalled warping functions, which – separately for each curve - move each point of the domain to a new position (e.g., Vantini 2012). As in the case of the comparison of curves, also for the identification of the set of warping functions the supervised or unsupervised approach can be used. In the first case known landmark points (e.g., anatomical landmarks) are considered for each curve and the identification of the warping functions is driven by matching these points across the curves (Ramsay and Silverman 2005). In the second case there are no known landmark points and the identification of the warping functions is driven by the maximization of an application-related similarity measure between curves (e.g., Sangalli et al. 2009). Research questions: We hereby test both approaches to extend the statistical analysis to the warping functions and to explore the possible effects of misalignment on the analysis of tongue profiles. In particular, we aim at observing how the lingual articulation of [t] may differ according to the degree of bilingual competence in Italian and Tyrolean (a German Dialect). We decide to focus on the lingual articulation of the apico-alveolar consonant when followed by a low vowel because we expect minor individual variability and reduced dorsal coarticulation than with other combinations. We expect difference in tongue root position depending on the degree of competence of the speaker as in Tyrolean /a/ is produced as the low back vowel [ɑ], while in Italian it is rather a low front vowel [a].
37
Th
Background: In order to perform a statistical comparison of groups of curves (i.e., tongue profiles), two different approaches can be pursued: supervised and unsupervised functional data analysis (Ramsay and Silverman 2005). As for the supervised approach, curves are a-priori labelled into different groups associated to specific experimental conditions (e.g., allophonic differences, see Vietti et al. 2016). In this case, we typically aim at the identification of group-specific patterns that can distinguish one group from any other one. We frame this task by comparing different groups of tongue profiles by means of a non-parametric local null hypothesis testing procedure for functional data (i.e., multi-aspect interval-wise testing, as discussed in Pini and Vantini 2017 and in Pini et al 2017). As for the unsupervised approach, curves are not a-priori labelled into groups. On the contrary, with that approach the analysis aims at identifying groups of curves that present group-specific patterns that are not recurrent in the remaining set of curves. We target this issue by applying a non-parametric clustering technique for functional data (i.e., functional K-mean alignment, Sangalli et al. 2010). In this specific case, both supervised and unsupervised approach to functional data analysis are challenged by the possible presence of misalignment in tongue profiles (i.e., across tongue profiles as physiologically “corresponding points” are not associated to the same point of the domain). Alignment (aka registration) procedures are basically used to find an optimal matching of corresponding points across the functions and they achieve this by warping the domain
Thursday – October 5, 2017
Methods: The two functional strategies are tested on a dataset containing ultrasound tongue profiles. Synchronized acoustic and ultrasound data were obtained using the Articulate Assistant Advanced (AAA) software. Ultrasonic recordings were captured by means of an Ultrasonix SonicTablet ultrasound imaging system. The tongue contour was tracked using a convex array transducer (Ultrasonix C9-5/10) at 5MHz, stabilized by means of a helmet. Ultrasound recordings were collected at a rate of ≈100Hz with a field of view of 120°. Data were produced by 16 subjects (age x̅ =28.1, 8M-8F) from the Italian-Tyrolean bilingual region of South Tyrol (Italy). Speakers are divided in three groups according to their age of bilingual acquisition: Italian-dominant sequential bilinguals (5), Tyrolean-dominant sequential bilinguals (5) and simultaneous bilinguals (6). The dataset includes the productions of /t/ in /ta/ sequences in the non-word /tatata/. The non-word was initially intended as a calibration device to better define probe orientation before starting the actual experimental phase (Galatà et al. 2016). During the set-up phase the stimulus /tatata/ was administered twice so as to record at least 7 repetitions of [t], resulting in a dataset of 96 tokens. The analysis is carried out using the R packages fdakma and tongue.analysis (specifically designed for tongue profiles comparison). References Pini, A., Spreafico, L., Vantini, S., & Vietti, A. (2017). Multi-aspect local inference for functional data: Analysis of ultrasound tongue profiles. Tech. Rep. MOX, Dept. of Mathematics, Politecnico di Milano. Pini, A., & Vantini, S. (2017). Interval-wise testing for functional data. Journal of Non-Parametric Statistics, in press. Galatà, V., Spreafico, L., Vietti, A., & Kaland, C. (2016). An acoustic analysis of /r/ in Tyrolean. Interspeech 2016, 1002-1006. Ramsay, J., & Silverman, B. (2005). Functional data analysis. Dordrecht: Springer. Sangalli, L., Secchi, P., Vantini, S., & Veneziani, A. (2009). A case study in exploratory functional data analysis: Geometrical features of the in38
Oral Session 5
ternal carotid artery. Journal of the American Statistical Association, 104(485), 37-48. Sangalli, L., Secchi, P., Vantini, S., & Vitelli, V. (2010). k-mean alignment for curve clustering. Computational Statistics and Data Analysis, 54, 1219-1233. Vantini, S. (2012). On the definition of phase and amplitude variability in functional data analysis. Test, 21(4), 676-696. Vietti, A., Spreafico, L., & Galatà, V. (2015). An ultrasound study on the phonetic allophony of Tyrolean /r/. Proceedings of the 18th ICPhS, Paper number 0779.1-4. Keywords: methodological research, functional data analysis, bilingual speech
Oral Session 5
Thursday – October 5, 2017
Dorsal Crescents: Area and radius-based mid-sagittal measurements of comparative velarity James M Scobbie1, Joanne Cleland2 1
Queen Margaret University, Scotland University of Strathclyde, Scotland
2
[email protected],
[email protected]
ison based on “dorsal crescents”, specifically “KT crescents” in which a speaker’s contrastive tongue dorsum position for a velar stop (/k/ or /g/) in one word is compared to the redundant dorsal position in the contrasting alveolar stop (/t/ or /d/), in another matching word. The crescent is delimited by the two spatial crossing points found when a velar and alveolar tongue surface are overlaid in articulatory space (Figure 1). This paper reports on the dorsal crescent for English intervocalic /k/ vs. /t/, at the burst, in the context of three corner vowels, in data collected from a group of 30 children. Methods: Thirty typically-developing children were measured, recruited as part (Group 1) of the ULTRAX project. The average age was 9.6 years, ranging from 5;7 to 12;8. Each child pronounced one token of English /ata/, /iti/, /oto/, and /aka/, /iki/, /oko/ as part of a larger dataset. Most children had a Scottish accent of English. The probe was held steady with an Articulate Instruments headset. For the high speed ultrasound systems’s frame rate, acoustic synchronisation and spatial resolution, see Wrench and Scobbie (2016). The data was hand segmented with the stop bursts being labelled and a tongue spline being fitted at each stop burst semi-automatically using AAA software. Tongue surfaces at each burst were exported as a series of radial distances to Excel, using the AAA measurement fan grid of 42 radii, centred on the probe. For each of the three /kt/ pairs per speaker, each radial difference in the dorsal crescent between /k/ and /t/ was calculated (and the maximum difference’s size and location noted). The area of the dorsal crescent was estimated as the
39
Th
Background: What is the size (and nature) of the dorsal gesture needed to achieve a constriction of complete closure for a velar stop? Our angle on this topic is informed by our desire to understand the articulatory changes observed during the successful treatment of a relatively common developmental disorder/ delay (cf. also typical development in younger children) which involves /k/ undergoing a “velar fronting” process, so that it sounds like [t] and indeed may be completely merged with /t/. We wish to efficiently quantify the articulatory extent of velarity (in relation to transcribed output) in such cases specifically. The measure should be suitable for tracking longitudinal change in the contrast during clinical intervention, and preferably for acquisitional or cross-linguistic research and even dynamic gestural analysis. Given the nature of ultrasound imaging, providing a quantitative measure of target velarity is more easily approached through analysis of tongue shape and location rather than degree of constriction (the relative approximation and/or contact between tongue and hard and soft palatal areas). However, a limitation of such imaging is that there may be data missing near the tongue tip, especially for alveolars, and from the most posterior (root) parts of the tongue, making a reliable operational definition of the dorsal area harder. Contrast-size is context-sensitive. Velarity can be measured relative to the other articulations in a word, e.g. a flanking vowel or consonant ([aka], [əkə], [sk]); or in comparison to a different comparator segment; or against a neutral or average tongue configuration including the active vocal tract hull. Here we take the middle route, proposing a contrastive compar-
Oral Session 5
Thursday – October 5, 2017
Figure 1: Illustrative dorsal crescent in an /a/ context calculated from pooled /k/ (solid markers) and /t/. The origin is the centre of the micro-convex probe. The right panel zooms in on the KT crescent showing 13 difference radii.
sum of a series of annular sectors centred on each radius with the (greater) distance to /k/ defining the external circumference and the distance to /t/ defining the internal one. Results and discussion: KT crescent areas and maximal KT crescent radial differences are in Table 1, which shows smaller differences between /k/ and /t/ in the /i/ environment, in which both are palatalized, compared to the other contexts. The good correlation (R2=0.66) between the two is useful since the radial difference is more easily computed and is well-defined even if a crescent is ill-formed (e.g. due to lack of tips), while the area measure is more robust, being unchanged under translation and rotation. (Area is unaffected by different probe placement, e.g. in different sessions, so long as the data is mid-sagittal). Both measures are useful.
Mean
/a/
/i/
/o/
/a/
/i/
/o/
294
164
272
11.9
7.5
12.1
SD
83
82
90
3.0
3.4
3.8
Min
159
43
52
7.3
3.3
5.9
Max
450
405
554
18.0
16.0
22.0
Table 1: English KT crescent area (mm ), left, and maximal radial differences (mm), right, in a child sample 2
We will also qualitatively discuss the shape, 40
width and location of these dorsal crescents, and report on the length of the imaged tongue surface, and the depth of the tongue surface from the probe. Some longitudinal changes will be reported, and the KT crescent measures compared to pilot adult data. References Wrench, A.A. and Scobbie, J.M. (2016). Queen Margaret University ultrasound, audio and video multichannel recording facility (2008-2016). Technical Report. CASL Research Centre. Keywords: speech & language acquisition, methodological research, interaction phonology & phonetics
Oral Session 6
Thursday – October 5, 2017
UltraPhonix: Ultrasound Visual Biofeedback for the Treatment of Speech Sound Disorders Joanne Cleland1, James M.Scobbie2, Zoe Roxburgh2, Cornelia Heyde2 & Alan Wrench3 University of Strathclyde, Scotland Queen Margaret University, Scotland 3 Articulate Instruments Ltd., Scotland 1
2
[email protected],
[email protected],
[email protected],
[email protected],
[email protected]
Methods: Single-subject multiple (3) baseline design, with untreated wordlists, with 15 children aged 6 to 15 with SSDs affecting vowels and/or lingual consonants in the absence of structural abnormalities. Figure 1 shows the intervention and probe schedule. Ultrasound data was acquired using an Ultrasonix SonixRP machine remotely controlled
via Ethernet from a PC running Articulate Assistant Advanced softwareTM (Articulate Instruments Ltd, 2014) version 2.15 which internally synchronised the ultrasound and audio data. The echo return data was recorded at ~121 frames per second (fps), i.e. ~8ms per frame with a 135 degree field of view (FOV) in a mid-sagittal plane. In contrast to most other intervention research, all ultrasound data was collected using a probe stabilizing headset. A bespoke version of AAA 2.16 (Articulate Instruments Ltd, 2016) was used for therapy. Each child received 10-12 sessions of U-VBF with each child required to perform at 80% accuracy at each level of performance before moving on to a motorically more demanding level (for example, from single syllable words to disyllabic words). Narrow transcription of wordlists was undertaken and percentage targeted segments correct calculated. Prior to intervention the ultrasound data was analysed both qualitatively and quantitatively to identify errors. Ultrasound nalyses involve annotation of target segments at the burst for stops and the acoustic midpoint for sonorants and fricatives. A spline indicating the tongue surface is then fitted to the image and tongue splines from different attempts are compared. Results and discussion: Six children were treated for velar fronting; three for post-alveolar fronting; three for the unusual pattern of backing to pharyngeal or glottal; one for production of all syllable onsets as [h]; one for vowel merger and one for lateralised sibilants. Most children (10/15) achieved the new articulation in the first or second session. Four children took until the 6th to 9th session to 41
Th
Background: Speech Sound Disorders (SSD) are the most common type of communication impairment, with recent figures suggesting 11.5% of eight year olds have SSDs ranging from mild distortions to speech that is unintelligible even to close family members (Wren et al., 2016). For children with persistent and intractable SSDs there is a growing body of evidence that ultrasound visual biofeedback (U-VBF) might be a promising approach. However, most studies originate from the USA and Canada, and focus on the remediation of delayed/disordered /r/ production (for example McAllister et al., 2014). While ultrasound is ideal for visualising /r/ productions, it also offers the ability to visualise a much larger range of consonants and all vowels, for example Cleland et al. (2015) report success in treating persistent velar fronting and post-alveolar fronting of /ʃ/. This paper will report on the results of the “UltraPhonix” project designed to test the effectiveness of U-VBF for a wider range of speech sounds in more children than previously reported. The study takes a particular focus on when children who begin intervention not stimulable for a particular speech sounds are first able to produce that new articulation. This is an important issue for establishing the dosage of U-VBF and other motor-based therapies.
Oral Session 6
Thursday – October 5, 2017
Week No Probe
1
2
3
B1
B2
B3
4
5
6
7
8
9
10
10 Sessions of Therapy
11
12
13
14
...
M1
26 M2
Mid
Figure 1: Probe and Intervention Schedule. B1/2/3: Baselines; Mid: midintervention probe; M1/2: maintenance probes.
achieve the new articulation and one never did. Those children who acquired the new articulation earlier in the therapeutic process were able to integrate that new articulation into words and sentences more quickly and generalise to untreated words more successfully. 13/15 children made improvements of more than 20 percentage points increase in the accuracy of targeted segments in untreated wordlists. One child made no improvement and one moved towards a phonetically closer approximation of the target. Examples of unusual tongue shapes, before and after U-VBF will be presented. In conclusion U-VBF shows promise as technique for rapid acquisition of new articulations in persistent SSDs. References Articulate Instruments Ltd 2012. Articulate Assistant Advanced User Guide: Version 2.14. Edinburgh, UK: Articulate Instruments Ltd. Cleland, J., Scobbie, J.M. & Wrench, A., (2015). Using ultrasound visual biofeedback to treat persistent primary speech sound disorders. Clinical Linguistics and Phonetics, 1-23. McAllister Byun, T. M., Hitchcock, E. R., & Swartz, M. T. (2014). Retroflex versus bunched in treatment for rhotic misarticulation: Evidence from ultrasound biofeedback intervention. Journal of Speech, Language, and Hearing Research, 57(6), 2116-2130. Preston, J. L., Brick, N., & Landi, N. (2013). Ultrasound biofeedback treatment for persisting childhood apraxia of speech. American Journal of Speech-Language Pathology, 22(4), 627-643. Sjolie, G. M., Leece, M. C., & Preston, J. L. (2016). Acquisition, retention, and generalization of rhotics withand without ultrasound visual feedback. Journal of Communication Disorders, 64, 62-77. Wren, Y., Miller, L. L., Peters, T. J., Emond, A., & Roulstone, S. (2016). Prevalence and Predictors of Persistent Speech Sound Disorder at
42
Eight Years Old: Findings From a Population Cohort Study. Journal of Speech, Language, and Hearing Research, 59(4), 647-673. Keywords: speech motor control, speech sound disorders, visual biofeedback
Oral Session 6
Thursday – October 5, 2017
Coarticulation and stability of speech in Italian children who stutter and normally fluent children: an ultrasound study Giovanna Lenoci, Irene Ricci Scuola Normale Superiore, Pisa, Italy
[email protected],
[email protected]
results for the younger stutterers of this study. Token-to-token variability through multiple repetitions of the same item was evaluated as a measure of stability in the fluent speech. Studies investigating articulatory dynamics in speakers with impaired speech production have obvious importance for advancing our understanding of the disorder itself, which ultimately has implications for diagnosis and treatment of the speech impairment. Preliminary results will be presented for a comparison between stuttering children and a control group. Methods: Four Italian stuttering children (age range: 6-12 years) and 4 normally fluent peers were recruited and recorded. Stimuli were elicited in a phrase repetition task embedded in a child friendly set up. The pre-recorded acoustic stimuli contained a dysillabic pseudo-word, embedded in a carrier phrase of the type: ‘La gattina C1V1ba salirà’ (literal English translation: ‘The little cat C1V1ba will go up’). Within the stressed first syllable (C1V1), C1 is /b/, /d/ or /g/ and V1 one of the three cardinal vowels, /a/, /i/ or /u/. Each target syllable was followed by a bilabial consonant to prevent the influence of additional lingual coarticulation. Twelve repetitions of each CV sequence were collected in random order for each participant, for a total of 108 utterances recorded per participant. In a child-friendly set up, participants were seated comfortably in a chair and the ultrasound probe was held under the chin with the head stabilization unit (Articulate stabilisation headset Articulate Instruments ltd) for stabilizing the position of the transducer with respect to the head. Recordings were realized with an ultrasound system
43
Th
Background: Developmental stuttering is a fluency disorder that affects speaker’s ability to produce speech fluently. The hallmark characteristics of the disorder (i.e., sound/syllable repetitions, blocks and sound prolongations; Yairi & Ambrose, 2005) strongly suggest breakdowns in the precisely timed and coordinated articulatory movements required for fluent speech (Van Lieshout et al., 2004). Accordingly, the speech motor control system has been argued to be the critical weak link in the chain of events that lead to the production of speech for stuttering people. Empirical support for this view is provided by past literature according to which people who stutter differ from normally fluent speakers in aspects of speech motor dynamics. Anticipatory coarticulation and stability of speech are two critical indexes of speech motor control maturity and we want to assess if stuttering speakers, even during the perceptually fluent speech, differ from non-pathological population in respect to these measures. Coarticulation is a crucial mechanism for fluency and one of the most studied aspect in relation to stuttering. Nevertheless, equivocal results emerged from previous literature, and especially for children, articulatory data are scarce. Our study uses Ultrasound Tongue Imaging to identify the presence of abnormal coarticulatory patterns in a cohort of Italian children who stutter. Anticipatory coarticulation was determined through curve-to-curve distances comparison, following Zharkova, Hewlett & Hardcastle (2011). Previous research has also suggested that people who stutter are less stable (more variable) as compared to typically fluent speakers in the production of speech and we expect the same
Thursday – October 5, 2017
(Mindray UTI system-30 Hz) with a microconvex probe (Mindary probe 65EC10EA); the UTI images were synchronized with the audio through a synchronisation unit (Synch Bright-up unit). First, acoustic data were phonetically segmented and labelled, using PRAAT. The annotations were imported in the software used for the articulatory analysis (Articulate Assistant Advanced -Articulate Instruments ltd) for the selection of the relevant frames within each C-V intervals. A semi-automatic tongue contour splining was performed in the acoustic interval spanning from the beginning of the consonantal closure to the end of the following vowel. Two measures of articulatory ability are examined based on curve-to-curve distance (Zharkova et al. 2011). The stability of speech is examined from the analysis of variability of multiple repetitions of the same C within the same vowel context (ex. /da/); by contrast variability in the production across different vowel contexts is taken as an index of coarticulation (Zharkova & Hewlett, 2009). The preliminary results suggest that children who stutter reach the fluency in their speech with different articulatory dynamics compared to control group. References Van Lieshout, P. H. H. M., Hulstijn, W. & Peters, H. F. M. (2004). Searching for the weak link in the speech production chain of people who stutters: A motor skill approach. In Maassen, B., Kent, R., Peters, H. F. M., Van Lieshout, P. H. H. M & Hulstijn, W. Speech motor control in normal and disordered speech. Oxford: Oxford University Press, 313. Yairi, E. & Ambrose, N. (2005). Early Childhood Stuttering: for clinicians by clinicians. Austin, TX: Pro-Ed. Zharkova, N., & Hewlett, N. (2009). Measuring lingual coarticulation from midsagittal tongue contours: Description and example calculations using English /t/ and /ɑ/. Journal of Phonetics, 37, 248. Zharkova, N., Hewlett, N., & Hardcastle, W. J. (2011). Coarticulation as an indicator of speech motor control development in children: An ultrasound study. Motor Control, 15, 118.
44
Oral Session 6
Keywords: stuttering, speech motor control, anticipatory coarticulation.
Oral Session 7
Friday – October 6, 2017
Changes in dynamic lingual coarticulation of /s/ between eleven and thirteen years old Natalia Zharkova Queen Margaret University, Edinburgh, UK
[email protected]
Methods: There were 20 participants, ten in each age group: 13-year-olds (aged between 13;0 [years;months] and 13;11), and 11-year-olds (aged between 11;0 and 11;10). The materials were the syllables /sa/ and /si/ produced in a carrier phrase, each repeated five times. Ultrasound data were collected at the frame rate of 100 Hz, synchronized with the acoustic signal. At every ultrasound frame between the onset and offset of /s/, the tongue curve was traced using Articulate Assistant Advanced (Articulate Instruments 2012). Nine tongue curves at equal intervals throughout the consonant were then created, following Zharkova et al. (2014). LOCa-i and DEI indices were used to quantify changes in tongue shape, similarly to Zharkova (2016). LOCa-i has higher values when the tongue curve is more bunched in the front than in the back, while DEI values increase with the dorsum raising. Linear mixed Models (LMMs) were used for determining the presence of any coarticulatory effect. In the event of a coarticulatory effect, its magnitude was measured
within speaker by taking a ratio of the index value in the context of /i/ to that in the context of /a/, and then compared across groups. Results and discussion: Figure 1 presents index values during the consonant for the two age groups.
Figure 1: Boxplots showing the articulatory indices in the two age groups over the duration of /s/, in the two vowel contexts.
It is clear from the figure that the difference between index values across the two vowel contexts increases over time in both age groups. Table 1 shows that while for DEI there was a significant effect throughout the consonant in both age groups, for LOCa-i the younger child group took until well into the second half of the consonant before the effect was observed. As shown by the LMM results on the ratio values for the two articulatory indices, the size of the observed coarticulatory effects increased over time in both age groups (F = 46.88 for LOCa-i, and F = 43.94 for DEI). The effect of age group was not significant in either LMM
45
Fr
Background: Sibilant fricative consonants in English are acquired by children relatively late, and certain aspects of lingual articulation continue to mature throughout childhood and into adolescence (e.g., Munson 2004; Romeo et al. 2013; Zharkova et al. 2014). In particular, fine differences in the timing of lingual coarticulation have been reported between 10-12-year-old children and adults producing voiceless sibilant fricatives (Zharkova et al. 2014). The present study reports ultrasound data on the lingual dynamics of /s/ production in two groups of children aged 11 and 13 years old, focusing on vowel-on-/s/ anticipatory coarticulation.
Oral Session 7
Friday – October 6, 2017
(F = 0.06 and F = 1.70 for LOCa-i and DEI, respectively), nor was the interaction of age group and time point (F = 1.81 and F = 4.78 for LOCa-i and DEI, respectively), suggesting that the two age groups did not differ in the magnitude of the observed coarticulatory effects. The age-related differences in the timing of the effect on LOCa-i suggest that the ability to control subtle tongue shape changes during the alveolar fricative undergoes development between the age of 11 years old and early adolescence. Time point
LOCa-i
DEI
13yo
11yo
13yo
11yo
1
5.64
3.19
16.33
13.40
2nd
10.01
1.96
16.27
21.56
3rd
10.99
1.34
16.92
28.82
4
th
13.30
0.58
20.10
34.70
5th
19.50
1.64
27.81
36.16
6
th
23.35
5.78
35.03
45.79
7th
27.14
10.58
35.20
54.81
8th
30.20
17.73
37.88
61.25
9
38.60
23.48
38.53
64.98
st
th
Table 1: Results from LMMs establishing the presence of a coarticulatory effect. Significant results are in bold font.
References Articulate Instruments Ltd (2012). Articulate Assistant Advanced Ultrasound Module User Guide: Version 2.14. Edinburgh, UK: Articulate Instruments Ltd. Munson, B. (2004). Variability in /s/ production in children and adults: Evidence from dynamic measures of spectral mean. Journal of Speech, Language, and Hearing Research, 47, 58-69. Romeo, R., Hazan, V. & Pettinato, M. (2013). Developmental and gender-related trends of intra-talker variability in consonant production. Journal of the Acoustical Society of America, 134, 3781-3792. Zharkova, N. (2016). Ultrasound and acoustic analysis of sibilant fricatives in preadolescents and adults. Journal of the Acoustical Society of America, 139, 2342-2351.
46
Zharkova, N., Hewlett, N., Hardcastle, W.J. & Lickley, R. (2014). Spatial and temporal lingual coarticulation and motor control in preadolescents. Journal of Speech, Language, and Hearing Research, 57, 374-388.. Keywords: speech & language acquisition, speech production, speech motor control
Oral Session 7
Friday – October 6, 2017
Using ultrasound feedback to aid in L2 acquisition of Russian palatalization Kevin D. Roon1,2, Jaekoo Kang1, D. H. Whalen1,2,3 1
CUNY Graduate Center, USA Haskins Laboratories, USA 3 Yale University, USA
2
[email protected],
[email protected],
[email protected]
Methods: The stimuli are shown in Table 1.
Productions of these CVCs were recorded using an Ultrasonix SonixTouch ultrasound machine from two native speakers of Moscow Russian (one male, one female). The recordings included a video of the midsagittal view of the tongue, with concurrent audio. The data from the learners for this experiment are currently being collected, and will be analyzed in time for Ultrafest VIII. Each learner performs three tasks: a pre-training repetition task, a training session, and a post-training repetition task. In both repetition tasks, the learners hear (in randomized order) two tokens of each stimulus from Table 1, one produced by each speaker. Manner
Lower Lip
Tongue Tip
Stop
/pam/–/p am/
/tam/–/tjam/
/map/–/map /
/mat/–/matj/
/fam/–/fjam/
/sam/–/sjam/
/maf/–/maf /
/mas/–/masj/
j
j
Fricative
j
Table 1: Training stimuli used in the experiment.
The learners are instructed to repeat what they heard, and their productions are recorded (audio only). In the training session, learners sit in front of a computer monitor that is next to the display of the ultrasound machine. The computer monitor shows the production of the native speaker producing the non-palatalized consonant, and after a brief pause, the palatalized consonant. The display of the ultrasound machine shows the real-time moving image of the learner’s tongue as they attempt to repeat the pair. The learner can replay the movie of each pair up to 10 times. There are six groups of learners: one who hear only syllable-initial contrasts, one who hear only syllable-final contrasts, one who hear only lower-lip con47
Fr
Background: One challenge facing the learner of a non-native language is producing sounds that are not in the learner’s native language. Explicit articulatory instruction on producing such sounds has been shown to improve learners’ productions, compared to relying solely on mimicking acoustic input (Saito, 2013). While various types of articulation-based instructions have been shown to help non-native productions, one problem that they share is that they do not give learners any means of evaluating whether or how well they are correctly implementing the articulations that they have been instructed to make. The present study addresses that shortcoming. Real-time video feedback using ultrasound imaging of the tongue has been shown to help in the acquisition of specific non-native speech sounds in a handful of studies (Gick et al., 2008; Antolík et al., 2013). The present study investigates the acquisition of consonant palatalization in Russian by naive English speakers. Russian systematically contrasts palatalized vs. non-palatalized consonants across primary oral articulator, manner, voicing, and syllable position, in both stressed and unstressed syllables (Kochetov, 2002). Russian palatalization contrast is often challenging for English speakers to master (Diehm, 1998). We investigate whether participants can produce more native-like palatalized Russian consonants when they practice producing the sounds accompanied by real-time ultrasound video feedback of their own articulations side-by-side with similar videos of a native speaker. We also investigate to what degree learners can generalize palatalization to novel environments.
Friday – October 6, 2017
trasts, one who hear only tongue-tip contrasts, one who hear only stop contrasts, and one who hear only fricative contrasts. Within each group, each learner practices with two trials for each stimulus pair: one pair produced by the male Russian speaker, and the other produced by the female speaker. Both groups produce all stimuli in both repetition tasks. The post-training repetitions therefore include contrasts in trained and untrained environments. Results and discussion: To evaluate the effectiveness of the training, native speakers of Russian will be presented with pairs of productions from the learners, where in each pair one utterance was from the pre-training and the other was from the post-training by the same learner. The evaluators will be instructed that the people that they will listen to were being trained on producing palatalized consonants, and that on each pair they should indicate which of the two utterances was the one that the learner produced after training. If the ultrasound feedback is effective, then correct identification of the utterances from the post-training session should be above chance. We will analyze whether there are any differences between the training groups to determine whether learning the contrast is more difficult in one syllable position vs. the other, with lower-lip vs. tongue-tip consonants, and with stops vs. fricatives. We will also examine how the learners’ ability to generalize is affected by which syllable position/primary articulator/manner they were trained on. References Antolík, T. K., Pillot-Loiseau, C., & Kamiyama, T. (2013). Comparing the effect of pronunciation and ultrasound trainings to pronunciation training only for the improvement of the production of the French /y/-/u/ contrast by four Japanese learners of French. Paper presented at the Ultrafest VI Workshop, Edinburgh. Diehm, E. E. (1998). Gestures and linguistic function in learning Russian: Production and perception studies of Russian palatalized con-
48
Oral Session 7
sonants. (Ph.D.), The Ohio State University, ProQuest/UMI. Gick, B., Bernhardt, B. M., Bacsfalvi, P., & Wilson, I. (2008). Ultrasound imaging applications in second language acquisition. In J. G. Hansen Edwards & M. L. Zampini (Eds.), Phonology and Second Language Acquisition (pp. 309– 322). Amsterdam: John Benjamins. Kochetov, A. (2002). Production, perception, and emergent phonotactic patterns: A case of contrastive palatalization. New York, London: Routledge. Saito, K. (2013). Reexamining effects of form-focused instruction on L2 pronunciation development. Studies in Second Language Acquisition, 35, 1–29. Keywords: speech & language acquisition, palatalization, feedback
Oral Session 7
Friday – October 6, 2017
What the [ɫ] [ɻ] they doing? Ultrasound in the EFL pronunciation classroom. Thomas Kaltenbacher1,2 2
1 The Salzburg Speech Clinic, Austria Dept. of Linguistics, University of Salzburg, Austria
[email protected]
Methods: 30 students from the University of Salzburg participating in phonetics, phonology or pronunciation classes were recruited for this study. We used a Mindray DP-2200 Plus Digital Ultrasonic Diagnostic Imaging System with the Articulate Instruments probe stabilization headset (J. Scobbie, Wrench, & van der Linden, 2008). The data were recorded and analyzed using Articulate Instruments’ AAA software. All participants’ L1 was (Austrian) German. In a first ultrasound session the subjects' articulation of German and English phonemes was assessed. The focus was on
eliciting a substantial sample of German [l] and – depending on the respective German accent – [R], [ʁ] or [r] sounds. English items should elicit [ɫ] and [ɹ] articulations – or substitutions of the two. Subjects had to read out short carrier phrases and sentences with target sounds occurring in all possible phonological positions. Neither [ɫ] nor [ɹ] are phonemes of Standard German, so we expected subjects who had not learned the pronunciation of these sounds in English to substitute them with the closest German phonemes or to use the retroflex approximant [ɻ] for [ɹ], which can still be found in pronunciation textbooks for German speaking learners of English. Following this first assessment, participants were invited to a second recording session after 4 to 5 weeks. The second session began with an instruction of where the two sounds [ɫ] and [ɹ] should be articulated in English. Using UTI, the experimenter pointed out the different places of articulation and provided demonstrations for the English and German sounds. Following this instruction, subjects tried out the sounds with UTI feedback and were then asked to read out carrier phrases and sentences again. In the second session only English items were used. Results and discussion: In this EFL pronunciation training study we hope to provide some evidence for the benefit of UTI in an adult L2 learner setting. Learners of English with German as their L1 face several problems with [ɫ] and [ɹ] as the German phoneme inventory offers strong competition with different places of articulation. Also, some pronunciation textbooks still provide a [ɻ], which may have been fossilized by learners of Eng-
49
Fr
Background: How should university teachers of English as a foreign language correct the most persistent pronunciation errors? Firstly, by identifying which sounds learners use to substitute the English target phoneme. Secondly, by showing their students how and where to pronounce sounds. The English phoneme inventory contains sounds that are easily explained because their place of articulation is clearly visible, e.g. bilabials, dentals, alveolar sounds. And then there are the sounds that are not produced at clearly visible places. Additionally, one might have to crack the case of an English phoneme competing with a similar 1st language phoneme. This paper (work in progress) describes the sounds that learners of English produce when the target sounds are [ɫ] and [ɻ] and suggests how the proper pronunciation of these sounds may be achieved using UTI. The English versions of /r/ have a wide range per se (James M Scobbie, Lawson, Nakai, Cleland, & Stuart-Smith, 2015), and this study was inspired by an oral paper entitled "where the [l] [r] they?" (J. M. Scobbie & Stuart-Smith, 2006).
Friday – October 6, 2017
lish during their early instructions in school. UTI is one of several imaging technologies at hand that may help learners and teachers of EFL. This study supports two ideas: 1. through multi-modal articulation training it is possible to teach students how to produce sounds properly, 2. by making them aware of the anatomical, physiological and motor aspects, learners benefit from motor-sensory feedback so that native-like pronunciation can be acquired. The sounds [ɫ], [l] and [r], [ɹ] are excellent examples for an UTI application and have been documented in recent L2 research (Tateishi, 2013; Tsui, 2012). Our study offers an intervention perspective on improving English pronunciation and an outlook on future applications of UTI in an adult EFL setting where German is the learners’ L1. References Scobbie, J., Wrench, A. A., & van der Linden, M. L. (2008). Head-probe stabilisation in ultrasound tongue imaging using a headset to permit natural head movement. Paper presented at the 8th International Seminar on Speech Production (ISSP 2008). Scobbie, J. M., Lawson, E., Nakai, S., Cleland, J., & Stuart-Smith, J. (2015). Onset vs. Coda Asymmetry in the Articulation of English /r. Proceedings of the 18th ICPhS, Glasgow. Scobbie, J. M., & Stuart-Smith, J. (2006). Where the [l] [r] they? An ultrasound tongue imaging study of Scottish English liquids. Paper presented at the British Association of Academic Phoneticians (BAAP). , Edinburgh, 10-12 April 2006. Tateishi, M. (2013). Effects of the use of ultrasound in production training on the perception of English /r/ and /l/ by native Japanese speakers. (M.A. MA Thesis), University of Calgary, Calgary, CA. Tsui, H. (2012). Ultrasound speech training for Japanese adults learning English as a second language. (MSc. MSc. Thesis), University of British Columbia, Vancouver, CA. Keywords: speech & language acquisition, speech production, English pronunciation training
50
Oral Session 7
Oral Session 8
Friday – October 6, 2017
Ultrasound as a tool for language documentation: The production of labio-coronal fricatives in Setswana Jonathan Havenhill, Elizabeth C. Zsiga, One Tlale Boyer,& Stacy Petersen Georgetown University, USA
[email protected],
[email protected],
[email protected],
[email protected]
Methods: Data were collected through fieldwork in two Botswanan villages. Nine native speakers of Setswana participated in data collection. Five (two men) are speakers of the Sengwato dialect, spoken in Shoshong, and four (two men) are speakers of the Sekgatla dialect, spoken in Oodi. Participants produced three repetitions of ten words, with /s, sʷ, ʃ, ɸ͡ʃ, x/ appearing word-initially before the vowels /a/ and /e/. Head movement was mitigated with a chair-mounted head stabilization device and the ultrasound transducer was affixed beneath the speaker’s chin with an articulated arm mounted to a table. Ultrasound data were captured using a SonoSite MTurbo portable ultrasound machine with a C60x 5–2 MHz transducer at a scan depth of 9.2 or 11 cm. Acoustic data were recorded using a Marantz solid state recorder with a headset condenser microphone and synchronized to the ultrasound signal
with an Elgato Video Capture device. Video data were recorded with a Sony camcorder at 30 fps. Aerodynamic data were captured with a Rothenberg mask and pneumotachograph. Still ultrasound and video images were extracted at the points of maximum lingual and labial constriction, respectively. Tongue contours were traced using EdgeTrak (Li, Kambhamettu, & Stone, 2005) and modeled with polar smoothing spline ANOVA (Davidson, 2006; Mielke, 2015). Acoustic measurements were made using Praat (Boersma & Weenink, 2017). Lip configuration was analyzed by calculating the ratio of vertical lip openness to horizontal lip spread.
Figure 1: Lip configurations for [sʷ] (left) and [ɸ͡ʃ] (right).
Results and discussion: Analysis of the ultrasound data reveals that [ɸ͡ʃ] exhibits a laminal post-alveolar constriction similar to that of [ʃ], while both [s] and [sʷ] exhibit apical alveolar constrictions. Representative tongue contours produced by a speaker of Sekgatla are presented in Figure 2. Video data reveal that [ɸ͡ʃ] is produced with lip compression, identical to that observed for [ɸ], but distinct from the rounding observed on [sʷ], as shown in Figure 1. The acoustic data indicate that [ɸ͡ ʃ] is distinct from [s], [ʃ], and [sʷ]. Moreover, the spectrum for [ɸ͡ʃ] is steady throughout its duration, suggesting that it is not produced as a sequence of fricatives. While the aerodynamic data are inconclusive, they show a buildup of pressure behind the lips for at least
51
Fr
Background: Doubly-articulated fricatives are exceptionally rare among the languages of the world. Ladefoged and Maddieson (1996) argue that doubly-articulated fricatives do not occur in any language, but the labio-coronal fricative [ɸ͡ʃ] is reported to occur in Setswana (Cole, 1985; Tlale, 2005). Instrumental phonetic data for this sound is lacking, however, and its status as a true doubly-articulated fricative remains unclear. Ladefoged and Maddieson suggest that all reported cases of doubly-articulated fricatives can analyzed as sequences ([ɸ+ʃ]) or as secondary articulation ([ɸʲ] or [ʃʷ]). This paper reports the results of an articulatory study of labio-coronal fricatives in Setswana, combining ultrasound tongue imaging with acoustic, video, and aerodynamic analysis.
Friday – October 6, 2017
some tokens of [ɸ͡ʃ], suggesting that frication may be produced at both the lingual and labial constrictions.
Figure 2: SSANOVA tongue contours for /s, sʷ, ʃ, ɸ͡ʃ, x/
Taken together, these results suggest that [ɸ͡ʃ] is best characterized as double articulation, not as a sequence or as secondary articulation. The possibility of an articulatory sequence is ruled out by acoustic analysis, while comparison of the lip configurations for [sʷ] and [ɸ͡ʃ] suggests that the compressed articulation for [ɸ͡ʃ] is distinct from the rounding observed elsewhere in the inventory of Setswana. Despite the challenges inherent in transporting ultrasound equipment to remote locations, the ability to collect ultrasound data in the field is shown to be crucial to establishing the nature of these unusual and previously unstudied sounds. Acknowledgements: This work was supported by NSF BCS-1052937. Many thanks to Zachary Jaggers, who analyzed the lip video data. References Boersma, P. & Weenink, D. (2017). Praat: doing phonetics by computer [Computer program]. Version 6.0.28, retrieved 21 April 2017 from http://www.praat.org/. Cole, D. T. (1985). An introduction to Tswana grammar. Cape Town: Longman. Davidson, L. (2006). Comparing tongue shapes from ultrasound imaging using smoothing spline analysis of variance. The Journal of the Acoustical Society of America, 120(1), 407–415.
52
Oral Session 8
Ladefoged, P., & Maddieson, I. (1996). The sounds of the world’s languages. Oxford, UK: Blackwell. Li, M., Kambhamettu, C., & Stone, M. (2005). Automatic contour tracking in ultrasound images. Clinical Linguistics & Phonetics, 19(6–7), 545–554. Mielke, J. (2015). An ultrasound study of Canadian French rhotic vowels with polar smoothing spline comparisons. The Journal of the Acoustical Society of America, 137(5), 2858–2869. Tlale, O. (2005). The phonetics and phonology of Sengwato, a dialect of Setswana (Doctoral dissertation). Georgetown University, Washington, DC. Keywords: speech production, language documentation, double articulation
Oral Session 8
Friday – October 6, 2017
Micro-variation in a metaphonic system: Acoustic-articulatory evidence from the Tricase dialect Andrea Calabrese1, Mirko Grimaldi2 Department of Linguistics – University of Connecticut, USA Centro di Ricerca Interdisciplinare sul Linguaggio (CRIL) – University of Salento, Lecce, Italy 1
2
[email protected] [email protected]
Methods: 6 Tricase native speakers (2 females; mean age 21.6, range, SD 1,21) were recorded by an US Aplio XV machine (Toshiba Med. Sys. Corp): i.e., C.R.(f), M.M. (m), L.G. (m), G.E.(m), M.B.(f), G.G.(m). Within a soundproof room, the subjects were asked to read out at a normal rate word stim-uli embedded in a carrier phrase Ieu ticu __moi ‘I say __now’. 10 stimuli for each kind of vowel and each kind of metaphonic context were acquired (90 stimuli for each subject, for a total amount of 540 stimuli, randomly interchanged between sub-jects). A Shure SM58_LCE microphone and the CSL 4500 recording system at a sampling rate of 22.05 kHz at 16-bit precision were used. For each vowel, total duration as well as F0, F1, F2 and
F3 were measured in the vowel steady tract (0,025 s) centered at the midpoint (here we focused on the F1 and F2 values only). The US video stream was synchronously acquired together with the audio signal, by means of an exter-nal a/v analog-to-digital acquisition card, and then recorded in real-time on a dedicated PC. The probe was rigidly locked into a fixed position on a microphone stand appropriately adapted and the head of the subject was immobilized. The US a/v stream was segmented offline by an automatic software proce-dure developed at CRIL, on the basis of audio pulses superim-posed to the speech signal before and after each sentence during recording. For each segmented sentence, looking at the acoustic waveform the operator manually placed some labels around the time intervals where the relevant vowels occurred, so that the corresponding US pictures could be identified and processed with EdgeTrak. An independent t-test was carried out to examine the assimilatory effect of the final vowels [i], [u], [e], [a] on the stressed mid-vowels /ɛ/ and /ɔ/ (alpha level p