Spatial and dynami

1 downloads 0 Views 2MB Size Report
the production of retroflex consonants in languages such as diverse as Tamil, ... were compared to similar consonants in other languages – Hindi (Å varný ..... The materials for the current study consisted of 3 meaningful Kannada words with geminate ..... http://www.ag500.de/manual/ag500/JustView.pdf) and without further ...
Elsevier Editorial System(tm) for Journal of Phonetics Manuscript Draft Manuscript Number: PHON13-60R3 Title: Spatial and dynamic aspects of retroflex production: An ultrasound and EMA study of Kannada geminate stops Article Type: Research Article Keywords: retroflex consonants, geminates, speech production, Kannada, ultrasound, articulography, coarticulaiton Corresponding Author: Dr. Alexei Kochetov, Ph.D. Corresponding Author's Institution: University of Toronto First Author: Alexei Kochetov, Ph.D. Order of Authors: Alexei Kochetov, Ph.D.; N. Sreedevi, Ph.D.; Midula Kasim, M.Sc.; R. Manjula, Ph.D. Abstract: This study investigates the production of geminate retroflex stops in Kannada using a combination of ultrasound and articulography. Data obtained from 10 native speakers of the language show that the retroflex gesture is dynamically complex and asymmetrical, involving an anticipatory retraction of the tongue tip, followed by the raising of this articulator towards the hard palate, and subsequent rapid flapping-out movement during the closure and the release. The retroflex constriction and the forward movement appear to be facilitated by the simultaneous fronting of the posterior tongue body, flattening of the anterior tongue body, and lowering of the jaw. Compared to dental and velar stops, retroflex stops exert extensive anticipatory and perseverative coarticulatory effects on adjacent vowels and inter-speech intervals. With respect to the magnitude of the tongue tip displacement, the anticipatory effects are greater than perseverative effects. The results of the study thus offer a multi-faceted view of spatial and dynamic aspects of retroflex stop production in Kannada, confirming and extending previous findings for other Dravidian languages. The results also provide support for general models of lingual consonant production and coarticulation.

*Manuscript Click here to view linked References

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Spatial and dynamic aspects of retroflex production: An ultrasound and EMA study of Kannada geminate stops Alexei Kochetova, N. Sreedevib, Midula Kasimb, and R. Manjulab a

University of Toronto, b All India Institute of Speech and Hearing

Corresponding author: Alexei Kochetov (University of Toronto) email: [email protected] phone: 1-416-946-3808

1. Introduction

Retroflex consonants are produced with the tongue tip or its underside making a constriction behind the alveolar ridge – in the post-alveolar or palatal area of the roof of the mouth (Laver 1994; Ladefoged & Maddieson, 1996). Retroflexes are known to be cross-linguistically uncommon. For example, only 36 out of 317 languages (or 11.4%) in the UCLA Phonological Segment Inventory Database (UPSID: Maddieson, 1984) have phonemic retroflex stops, and, peculiarly, most of these languages are spoken in two linguistic areas – South Asia and Australia (Bhat, 1973). Given the relative rarity and the skewed distribution of retroflexes, the phonology and phonetics of these consonants have received considerable attention.

Most previous articulatory phonetic works, predominantly static palatography studies, have focused on the location of the retroflex constriction (post-alveolar or palatal) or on the part of the tongue that is involved in the constriction (the tip or its underside; e.g. Firth, 1948; Balasubramanian, 1972; Nihalani, 1974; Shalev, Ladefoged, & Bhaskararao, 1993; Anderson, 2000). The overall shape of the tongue during the retroflex constriction has not received as much attention, with the exception of a handful of X-ray and more recent MRI studies, mainly of Tamil (e.g. Švarný & Zvelebil, 1955; Ladefoged & Bhaskararao, 1983; Narayanan, Byrd, & Kaun, 1999; Proctor, Goldstein, Byrd, Bresch, & Narayanan, 2009). Further, given methodological limitations, most previous studies have examined exclusively static aspects of the retroflex 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

production – the relative position of articulators and/or the location of the contact at a point in time during the constriction. Dynamic aspects, such as the movement of the tongue towards, during, and away from the retroflex constriction have been relatively under-studied and are still poorly understood. Some of these aspects were recently examined using articulator tracking techniques such as electropalatography and articulography (e.g. Dixit, 1990; McDonough & Johnson, 1997; Narayanan et al., 1999; Tabain, 2009a; Simonsen, Moen, Cowen, 2008). These works have revealed some complex kinematic properties and co-articulatory patterns involved in the production of retroflex consonants in languages such as diverse as Tamil, Hindi, Norwegian, and Central Arrernte. They also showed considerable inter-speaker and cross-language variation in the production of retroflex consonants. Given the relatively small sample sizes of many of these studies (usually 1 to 4 participants), it remains unclear to what extent the observed patterns are representative of the class of retroflex consonants in general, or are language-specific, or perhaps even speaker-specific. Finally, while it has been established that vowels exert strong coarticulatory effects on retroflex consonants (e.g. Dixit, 1990; Krull & Lindblom, 1996; Simonsen et al., 2008), the reverse effect – coarticulation of adjacent vowels and non-adjacent consonants to retroflexes have hardly been investigated (but see Tabain, 2009a). Such coarticulatory effects, however, are interesting as potential phonetic sources of historical vowel changes next to retroflexes (Bhat, 1974; Hamann, 2003) and retroflex consonant harmony in South Asian and Australian languages (Arsenault, 2012; Gafos, 1999).

The goal of the current study is to investigate spatial and dynamic aspects of the retroflex-dental stop contrast in Kannada, a Dravidian language spoken in the South Indian state of Karnataka. Like other Dravidian languages, Kannada exhibits a phonemic retroflex-dental contrast, with stops (both singletons and geminates) commonly contrasting in word-medial position: /a:ʈa/ „play‟ vs. /a:ta/ „that person‟, /aɖu/ „to cook‟ vs. /adu/ „that‟, /aɳʈu/ „gum‟, /tantu/ „thread‟, /ɡaɳɖu/ „male‟, /kandu/ „brown‟, /aʈʈa/ „garret‟ vs. /atta/ „that side‟, /aɖɖa/ „across‟ vs. /adda/ „length‟ (Upadhyaya, 1972).1 In this study we focus on geminate retroflex and dental stops and

1

Like in other languages of South Asia, the anterior coronal stops /t/ and /d/ in Kannada are usually described as dental (i.e. IPA ] and ]), while the other anterior coronals, /s/, /n/, /l/, and /r/, as alveolar (Upadhyaya, 1972; Schiffman, 1983; Sridhar, 1990). Here we use symbols [t] and [d] to denote the dentals for convenience.

2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

examine their production using a combination of articulator imaging and tracking techniques. First, we use ultrasound to investigate changes in the tongue shape over the course of articulation of geminate retroflex stops, comparing them to the tongue shapes during the articulation of geminate dental and velar stops. Second, we employ electromagnetic articulography (EMA) to track the movement of the tongue tip and the jaw to explore the gestural dynamics of and longrange differences between geminate retroflex and dental articulations. To our best knowledge, this is the first articulatory investigation of retroflex consonants using a combination of ultrasound imaging and tracking techniques. It is also, apart from our pilot work, the first articulatory study of Kannada, a widely spoken (by about 35 million speakers; Lewis, 2009), yet relatively phonetically under-studied Dravidian language.

2. Articulatory studies of Dravidian retroflexes

Retroflex stops and their dental/alveolar counterparts in Dravidian languages have been investigated in a number of articulatory studies. The focus of this research has been primarily on Tamil (Švarný & Zvelebil, 1955; Ramasubramanian & Thosar, 1971; Balasubramanian & Thananjayarajasingham, 1972; Balasubramanian, 1972; Ladefoged & Bhaskararao, 1983; Krull & Lindblom, 1996; Proctor et al., 2009), and to a lesser extent on other related languages – Telugu (Švarný & Zvelebil, 1955; Ladefoged & Bhaskararao, 1983), Malayalam (Dart & Nihalani, 1999), and Toda (Shalev et al., 1994). In some of these studies, Dravidian retroflexes were compared to similar consonants in other languages – Hindi (Švarný & Zvelebil, 1955; Ladefoged & Bhaskararao, 1983; Krull & Lindblom, 1996) and Swedish (Krull & Lindblom, 1996). Most of the works used static palatography, either as the main method (Ramasubramanian & Thosar, 1971), or in combination with linguograms (Dart & Nihalani, 1999; Shalev et al., 1994), X-rays (Balasubramanian, 1972), or both (Švarný & Zvelebil, 1955). X-ray imaging was also used in Ladefoged & Bhaskararao (1983), while Krull & Lindblom (1996) and Proctor et al. (2009) employed electropalatography and MRI respectively. Notably, the majority of the abovementioned studies were based on a single speaker of a particular language, with the other studies having samples to two (Krull & Lindblom, 1996; for Tamil), four (Ladefoged & Bhaskararao, 1983; Proctor et al., 2009), five (Shalev et al., 1994), and nine speakers (Dart & Nihalani, 1999). Several other (also single-speaker) studies are of relevance, even though they did not specifically 3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

investigate retroflex stops: investigations of Tamil liquids by McDonough & Johnson (1997), who used EPG, static palatography, and acoustics, and by Narayanan et al. (1999), who used MRI, static palatography, and EMA. Scobbie, Punnoose, & Khattab‟s (2013) study of Malayalam liquids is also important as one of the first ultrasound studies of retroflexes. One important finding shared by most of these articulatory studies is that Dravidian retroflex stops (and liquids) are produced with the underside of the tongue tip (sub-apical) making a constriction behind the alveolar ridge, and sometimes at the hard palate. This is in contrast with Hindi/Urdu and other Indo-Aryan languages, where the corresponding sounds tend to be produced with lesser retroflexion – as apical post-alveolars or alveolars (Švarný & Zvelebil, 1955; Ladefoged & Bhaskararao, 1983; but see Krull & Lindblom, 1996).

The X-ray and MRI imaging methods have made it possible to observe the overall lingual configuration of retroflex consonants. Specifically, it was observed that the curling of the tongue tip/blade was accompanied by some raising and flattening of the anterior tongue body, jointly producing a concave tongue shape and a large sublingual cavity (Švarný & Zvelebil, 1955; Balasubramanian, 1972; Ladefoged & Bhaskararao, 1983; Narayanan et al., 1999; Proctor et al., 2009). Švarný & Zvelebil‟s (1955) X-ray tracings indicated that the tongue body and the root for the retroflex stop was somewhat fronted compared to the dental stop and the rest position, the finding that was echoed by Narayanan et al. (1999) for Tamil /ɭ/ (vs. /l/). As a consequence of these differences, retroflexes were characterized by greater front and back cavities compared to dentals, acoustically resulting in a substantially lower F3, as well as a lower F4 and higher F2 (Narayanan et al., 1999; McDonough & Johnson, 1997).

Although some dynamic aspects of the retroflex production could be inferred from static palatography data (Švarný & Zvelebil, 1955; cf. Firth, 1948 on Marathi; Butcher, 1992 on Australian languages), these could be investigated in a much greater detail using electropalatography (EPG) and articulography (EMA). EPG studies of Tamil retroflexes by Krull & Lindblom (1996) and McDonough & Johnson (1997), for example, revealed that the tip-palate contact for retroflexes was more anterior at the offset of the closure than at its onset, indicative of the „flapping-out‟ movement of the tongue tip (see Hamann, 2003 for a review). As, the location of the constriction for dentals did not change over time, the contrast between the two consonants 4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

was greater at the beginning of the closure than at its end, where the two articulations were quite similar (cf. Dixit, 1990 on Hindi; Simonsen et. al., 2008 on Norwegian; Tabain, 2009a on Central Arrernte). The rapid flapping-out movement of the tongue, accompanied by relatively short constriction, was also observed for the Malayalam retroflex /ɭ/ using high-speed ultrasound (Scobbie et al., 2013). The asymmetric nature of retroflex production was clearly observed in EMA studies. Narayanan et al. (1999), for example, found that the tongue tip movement for the Tamil retroflex /ɭ/ was on a curve, moving in the clockwise direction (assuming the head facing right) so that the peak of the horizontal (backing) movement of the tongue tip was achieved prior to the peak of the vertical (raising) movement. The shape of the movement for the dental /l/, on the other hand, was straight or slightly curved in the counter-clockwise direction, indicative of the near-simultaneous fronting and raising movement of the tongue tip (cf. figures 8, 9, 14 in Scobbie et al., 2013, on and Malayalam /l/ and /ɭ/). The Narayanan et al. study also found that the retroflex was characterized by a greater vertical displacement of the tongue tip, a faster movement towards and away from the constriction, and a shorter constriction compared to the dental (cf. Simonsen et. al., 2008 on Norwegian).

3. Retroflexes: Production mechanisms and coarticulation

The complex and asymmetric articulation of retroflexes provides an interesting test case for theories of speech production, and specifically for our understanding of production mechanisms of lingual articulations and their coarticulatory behaviour. In an EMA study of Wubuy (Australian) coronals, Best, Bundgaard-Nielsen, Kroos, Harvey, Baker, Goldstein, & Tiede (2010) proposed that apical and laminal articulations involve distinct patterns of tongue kinematics. Specifically, apicals (retroflexes and apico-alveolars) are produced with “an arch motion” of the tongue tip, characterized by the stabilization of the posterior tongue body and the “lever action” of the tongue tip. In contrast, laminals (lamino-dentals and palatals) are produced with a “forward thrust” of the entire tongue body around a pivot point (see on Iskarous, 2005 on the arch/pivot distinction). The two movements are also different in their relative speed: the apical arch-like motion of the tip is fast, while the laminal forward thrust of the tongue body is 5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

slow. The mechanism of the proposed stabilization of the tongue body for retroflexes, however, is still poorly understood. Conceivably, it could be achieved by the fronting of the posterior tongue body and the root, which, together with the lateral bracing, would facilitate the rapid flapping-out movement of the tip. Alternatively, some backing of the posterior tongue body and the root may facilitate the retraction of the tongue tip to the palatal region. In fact, researchers disagree on the role and the position of the tongue body in retroflexes (Bhat, 1974; Hamann, 2003; Narayanan et al., 1999), and the paucity of imaging data is compounded by possible language-particular differences in retroflex articulations. Our pilot ultrasound investigation of Kannada geminate consonants (Kochetov, Sreedevi, Kasim, & Manjula, 2012b) provided evidence for the fronting of the tongue body as the stabilization method. Specifically, the posterior tongue body for retroflex stop /ʈ/ produced by four speakers (2 males and 2 females) was more front than for dental /t/ and similar to alveolopalatal /ʧ/. These differences were further quantitatively confirmed in Kochetov, Sreedevi, & Kasim (2012a).

The view of retroflexes as produced with a stabilized tongue body suggests that these consonants involve a greater degree of articulatory constraint, as defined by the DAC model of coarticulation (Recasens, Pallarès, Fontdevila, 1997). Given this, retroflexes should resist to coarticulatory effects of adjacent segments and, in turn, exert strong coarticulatory effects on the latter. Given the spatially asymmetric production of retroflexes (with the tip being more retracted at the onset of the closure than at its offset), it is reasonable to expect retroflexes to exert stronger anticipatory coarticulation than perseverative coarticulation. Indeed, McDonough & Johnson (1997) observed that the curling of the tongue for the Tamil /ɭ/ could start as early as the first half of the preceding vowel. This was manifested in a substantial lowering of F3 through much of the vowel. Apart from this and a few other studies (Dave, 1977 on Gujarati; Tabain, 2009a on Central Arrernte), however, the questions of retroflex coarticulatory effects have not been investigated.

Another issue that received little attention is the role of non-lingual articulators in retroflex production. Švarný & Zvelebil‟s (1955) X-ray tracings, for example, suggested a somewhat lower jaw position for Tamil retroflexes than for dentals. Similarly, Tabain (2009b) found that 6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

the Central Arrernte retroflexes were produced with a lower the jaw, compared to other coronal consonants (apical alveolars, laminal dentals, and apical palato-alveolars). She proposed that the jaw movement is part of the retroflex production mechanism, as it can facilitate the retraction and the forward movement of the tip. If this is correct, we may expect the lowering of the jaw for retroflexes to be simultaneous with the retraction of the tongue tip towards the hard palate.

The current study attempts to contribute to our understanding of the mechanism of retroflex production and the consonant‟s coarticulatory effects. It extends and refines the ultrasound method of analyzing tongue shapes used in our pilot study and combines it with the articulography method to track the movement of the tongue tip and the jaw. This study also involves a relatively large sample size – 10 speakers – and was therefore expected to provide a basis for stronger generalizations about articulatory properties of Kannada retroflexes, and Dravidian retroflexes in general.

4. Material and methods

4.1 Participants

Ten native speakers of Kannada, 5 females (KF1-KF5) and 5 males (KM1-KM5) participated in the study. They were 21-26 years old, with the mean age 23.9; all were students and staff at the All India Institute of Speech and Hearing (AIISH) in Mysore. All the participants grew up in southern Karnataka (mostly in Mysore), except for one male subject from Shimoga, central Karnataka. As is typical for educated Kannada speakers, all of them were multi-lingual, reporting English (all), Hindi (6), Tamil (2), and Telugu (1) as their L2 and L3. The participants reported no speech or hearing problems.

4.2 Materials

The materials for the current study consisted of 3 meaningful Kannada words with geminate voiceless stops/affricates of three places of articulation: /atta/ „that side‟ with a dental /t/ (see 7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

footnote 1), /aʈʈa/ „garret‟ with a retroflex /ʈ/, and /akka/ „elder sister‟ with a velar /k/. The words were given in the Kannada orthography. The low vowel context /a/ was chosen as maximally spatially different from the lingual consonant articulations. It should be noted that this vowel tends to be lengthened in word-final position and shortened and raised before consonants (Schiffman, 1983: 6), resulting in forms that can be transcribed as [ʌ ːaˑ], [ʌʈːaˑ], and [ʌkːaˑ]. (Stress does not appear to play a role in Kannada; Sridhar, 1990: 301). Geminates were used to ensure that the ultrasound system (see section 4.3) could produce several frames of the consonant constriction duration. Geminate consonants in Kannada are described as twice as long as singletons, at least when occurring after short vowels (Schiffman, 1983: 8).

The target words were randomized together with other Kannada words used for a separate study and presented on a laptop screen. Each word was presented 10 times in a row, with an interstimulus interval of 1 second. The pause between different word trials was 3 seconds. The reading of the list was expected to produce 300 tokens of the target words (3 word items * 10 trials * 10 participants).

4.3 Ultrasound

4.3.1 Instrumentation and the procedure

Ultrasound data were collected using a PI 7.5 MHz SeeMore ultrasound probe (Interson Corporation, http://www.interson.com/) with a 90 degree field view and a depth of 10 cm. The ultrasound transducer emits high frequency sound waves that are reflected from the surface of the tongue, returning its 2-dimensional image (see Stone, 2005 for an overview of the method). According to the manufacturer‟s specifications, the frame rate for the system is 15 frames per second. The probe was connected through a USB port to a laptop computer. It was placed under the participant‟s chin and stabilized using a probe stabilization headset by Articulate Instruments (Scobbie, Wrench, & van der Linden, 2008). Stabilization was necessary to minimize the extraneous movement of the probe with respect to the participant‟s head. The image received from the ultrasound probe was displayed on the computer screen using the SeeMore software 8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

(version 1.3.02) and transferred to a Sony DVDirect MC6 multi-function DVD recorder via a PC-to-TV converter. The videos were recorded as uncompressed .VOB files, at a frame rate specified by the manufacturer as 29.97 fps, the NTSC standard rate. The audio signal was captured using an AT831b lavalier microphone (at the sampling rate of 48 kHz) via an XLR cable connected to a Sound Devices USBPre2 pre-amp and transferred to the DVD recorder for synchronization with the video. The data were collected in a quiet room in the Speech Sciences Department at the AIISH. Prior to the data collection, the participants read through the entire word list to familiarize themselves with the materials and the task.

4.3.2 Data preparation and analysis

A total of 289 out of the 300 intended tokens were analyzed (with the other 11 tokens discarded due to poor imaging or mispronunciation errors). Still frames corresponding to the estimated maximum constriction (henceforth the „max frame‟) of /t/, /ʈ/, and /k/ were extracted from the video based on the criteria described in Kochetov et al. (2012b). For each token, 6 additional frames were selected: the 10th, 5th, and 2nd frames before the max frame (referred to as -10, -5, 2) and the 2nd, 5th, and 10th frames after the max frame (referred to as 2, 5, 10). Given the frame rate of the video (30 fps, i.e. each frame occurring every 33.3 ms), frames -10 and 10 occurred 333 ms prior to and after the max frame, respectively.2 Acoustically, these landmarks corresponded to the time points prior to the onset of the preceding vowel and close to the offset of the following vowel, respectively. Thus the first frame can be considered to capture the tongue in its neutral position, while the last frame captures the tongue approaching this position. The selected frames were imported into the EdgeTrak program (version 1.0.0.2; Li, Kambhamettu, & Stone, 2005), and the tracing of the tongue surface was performed following the procedure described in Kochetov et al. (2012b).3 2

The analysis of duration showed that the preceding vowel was on average 68 ms, while the following vowel was

178 ms. The duration of the consonant closure was on average 200 ms and the consonant release was 19 ms. Both were somewhat shorter for the retroflex and longer for the velar. 3

The tracings were initially done by one of the authors (MK); subsequently, a subset of 289 frames (14% of all tracings) was fully re-traced by the first author (AK) to verify consistency of the results. The original and re-traced contours (shown in Supplementary Material 1) were very similar, with an average vertical difference of 1.1 mm for two /aʈʈa/ contours, 0.9 mm for /atta/ contours, and 2.7 mm for /akka/. The greater difference for the latter was due

9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

The shape of the front part of the palate and the alveolar ridge for each speaker was estimated from the uppermost and frontmost positions of the tongue during saliva swallowing between trials (cf. Stone, 2005). The number of such tokens ranged from 5 to 13, with an average of 8 per participant. An informal observation of maximum constriction images for /aʈʈa/ also revealed consistent presence of a short bright line above and parallel to the surface of the anterior tongue body. This line tended to be at the level of the hard palate. We interpreted this line as an indicative of the approximate location of the tongue tip/underside contact with the palate for /ʈ/, and traced it separately (at its lowermost edge). Tokens of the presumed /ʈ/ contact could be obtained for all speakers (on average 8 tokens per participant) except KF2.4 Sample images illustrating the palate and the retroflex contact are shown in Figure 1.

[Insert Figure 1]

The extracted tongue contours were first examined qualitatively, with the retroflex max frames used to determine the direction of the tongue tip (sub-apical, apical, or laminal) and the approximate location of the constriction (alveolar, post-alveolar, or palatal). Differences between retroflex and dental tongue contours were further analyzed statistically using two kinds of analyses. The first analysis employed smoothing spline analyses of variance (SS-ANOVA) to evaluate statistical differences over the entire curves for /ʈ/ and /t/ at each of the 7 time points. The SS-ANOVA method (Gu, 2002) has been increasingly used for the analysis of ultrasound data (Baker, 2006; Davidson, 2006), as it provides a convenient tool for comparing two tongue shapes holistically – over the entire curves, rather than at a number of selected points. The SSANOVA analyses in this study were performed using an assist package of the R programming language (version R 2.14.1; www.r-project.org/), separately for each participant. For the to the less optimal imaging of the posterior tongue body for the velar. The original tracings were thus deemed reliable for the purposes of the study, and were used in the current analysis. 4 Since the rapid curling of the tongue tip can introduce artifacts in ultrasound images (see Wrench & Scobbie, 2006, 2011), it was desirable to verify our interpretation of the data. This was done as a follow-up ultrasound and electropalatography recording of the first author. While not a speaker of Kannada, he was trained to produce Kannada-like retroflexes. The results showed a similar white line appearing during the retroflex constriction with and without the palate, and at different degrees of the tongue retraction. (See Supplementary Material 2 for details.)

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

purposes of the comparison, each tongue curve was divided into three regions, roughly corresponding to the blade, the anterior tongue body, and the posterior tongue body. Two contours were considered to be significantly different if their 95% Bayesian confidence intervals did not overlap within at least 2/3 of the respective region. This is shown in Figure 2, where the difference between /aʈʈa/ and /atta/ at frame -10 is limited to the posterior tongue body, which is higher and more back for /t/ than /ʈ/. At the max frame (frame 0), the differences are observed in all 3 regions, with /ʈ/ having a more front and lower posterior tongue body, a partly higher anterior tongue body, and a considerably higher blade. At frame 10, the difference is limited to the tongue blade, which is higher for /t/ than /ʈ/.

[Insert Figure 2]

The second analysis of the contours examined differences between /ʈ/ and /t/ in horizontal and vertical displacement (cf. Kochetov et al., 2012b). „Closing displacement‟ and „opening displacement‟ were defined as differences between the maximum frame contour and the contour at frames -10 or 10 respectively (all averaged over 10 tokens). The first measure corresponded to the movement of the tongue from an inter-speech interval (ISP) towards the target, and the second one from the target towards another ISP (see Gick, Wilson, Koch, & Cook, 2004 on the articulation during ISPs). These differences were calculated for each of 100 X and Y coordinates of the contours and averaged, giving 4 values per word for each speaker (closing displacement X and Y, opening displacement X and Y). The values obtained from 10 participants were input into repeated measures ANOVAs with factors Gender (2 levels), Consonant (2 levels: /ʈ/ and /t/), and Displacement (2 levels: closing and opening) or analyzed using t-tests.

4.4 Articulograph (EMA)

4.4.1 Instrumentation and the procedure

11

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

The movement of the tongue tip and the jaw was examined using an AG500 electro-magnetic midsagittal articulograph (EMA; Carstens Medizinelektronik, Germany), conducted in a separate session. Two reference sensors were placed on the mastoid and the nasal bridge and other four sensors on the mid vermilion borders of the upper and lower lips, on a midsagittal section of the tongue about 1 cm from the tip, and on a midsagittal section of the mandible, using in total 6 channels. The kinematic data were recorded with a sampling rate of 200 Hz along with the time aligned speech acoustic data at 16 kHz. The calibration of AG500 was done according to the guidelines provided by the manufacturers, Carstens Medizinelektronik, Germany. See Yunusova, Green, & Mefferd (2009) and Hoole & Zierdt (2010) for details on the AG500 calibration and procedure. The data were collected in a specially designed lab in the Speech Pathology Department of AIISH. The presentation of the stimuli was the same as in the ultrasound experiment.

4.4.2 Data preparation and analysis

Native Carstens software was used for sensor position calculation and head movement normalization, as prescribed by the manufacturer (CalPos_2 and NormPos respectively; see http://www.ag500.de/manual/ag500/JustView.pdf) and without further rotation of the data with respect to the occlusal plane (cf. Yunusova et al., 2009; Hoole & Zierdt, 2010). Prior to the analysis, the data were band-pass filtered and smoothed using an 11-point Butterworth filter with cut-offs of 0.5 Hz and 10 Hz to remove low frequency DC drift and a high-frequency high frequency noise respectively. After that, a total of 292 out of 300 intended tokens (with the remaining 6 tokens involving mispronunciation errors or EMA processing errors) were subjected to several kinds of analysis (using MATLAB, Version 7.10, Release R2010a, The Mathworks, Inc.). First, the entire vertical (Y) and horizontal (X) trajectories of the tongue tip for /aʈʈa/ and /atta/, and /akka/ were examined. These consisted of 90 frames (450 ms) before and 90 frames (450 ms) after the consonant constriction midpoint, thus effectively including the target utterance with the preceding and following inter-speech intervals. The trajectories for multiple tokens were aligned based on the midpoint of the consonant constriction (as determined by the findgest procedure, see below) and normalized by taking the first frame (12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

90) of /akka/ as the baseline. SS-ANOVA (see above) was employed to determine whether statistical differences between the trajectories for /aʈʈa/ and /atta/ (see below), again separately for each speaker. These differences were evaluated within temporal regions determined by the gestural analysis (see below).

The gestural analysis was performed in the tongue tip trajectories of /aʈʈa/ and /atta/ using MView, a Matlab-based toolbox developed for visualization and analysis of EMA data by Mark Tiede (Haskins Laboratories). Specifically, the function findgest was used to label the landmarks indicated in Figure 3: the onsets and offsets of the gesture (G_ON and G_OFF) and the consonant constriction (C_ON and C_OFF), velocity peaks, and the trajectory maximum value (MAX). The labeling was based on the tongue tip trajectory that showed the maximum overall displacement. For /aʈʈa/, this was consistently the vertical trajectory (raising; Y); for /atta/ this was either the vertical trajectory (Y, for 4 speakers) or the horizontal trajectory (X, fronting; for 6 speakers) of the tongue tip. 5 Spatial and temporal values corresponding to the gestural landmarks were extracted and used to calculate the following measures: 

Closing Displacement, the difference (in mm) between the maximum X/Y value of the tongue tip and its value at the onset of the gesture (MAX – G_ON);



Opening Displacement, the difference (in mm) between the maximum X/Y value of the tongue tip and its value at the offset of the gesture (MAX – G_OFF);



Duration (in ms): intervals of the closing movement (the movement towards the constriction, C_ON – G_ON), the constriction (the gestural plateau, C_OFF – C_ON), and opening movement (the movement away from the constriction G_OFF – C_OFF).

(See Gafos, Kirov, & Shaw, 2010 for a detailed description of the findgest function; see also Shaw, Gafos, Hoole, & Zeroual, 2011 for an application of the MView analysis). To take the trajectory in Figure 3 as an example, the closing vertical displacement of the retroflex gesture is 16 mm (MAX 15.5 mm minus G_ON -0.5 mm), its opening vertical displacement is 20 mm

5

Simultaneous annotations of X and Y trajectories (using tangential velocity; cf. Gafos et al., 2010) was also attempted, but in many cases did not produce consistent results for retroflexes, whose X trajectory was strongly asymmetrical.

13

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

(MAX 15.5 mm minus G_OFF -4.5 mm), the closing movement is 110 ms (22 samples), the constriction is 215 ms (43 samples), and the opening movement is 115 ms (23 samples).

Finally, differences between retroflex and dental constrictions in the vertical position of the jaw were examined. These measurements were made at the midpoint of the gestural constriction for the tongue tip, as defined by findgest, and evaluated using a repeated measures ANOVA. The results are presented below separately for the ultrasound and EMA experiments.

[Insert Figure 3]

5. Results: Ultrasound

The presentation of the ultrasound results begins with a qualitative comparison of the tongue shapes and estimation of the tongue tip configuration and the constriction location (5.1). These are followed by an SS-ANOVA analysis of tongue shapes over time (5.2) and the gestural analysis (5.3).

5.1 Overview

Figure 4 present average maxima for /k/, /t/, and /ʈ/ for 5 female (left) and 5 male speakers (right), respectively. It can be seen that all 10 speakers produced the retroflex /ʈ/ with the characteristic anterior concavity resulting from the curling of the tip and flattening or slightly lowering the anterior tongue body. The posterior tongue body is fronted and held at about 45 degree angle, showing a relatively flat shape for most speakers. This, together with the lowering of the anterior tongue body, creates a convex shape in the middle part of the tongue. In contrast to the retroflex, the contact for the dental consonant is made with the moderately raised tip/blade of the tongue, presumably at the alveolar ridge and the upper teeth. The shape of the tongue is overall lowered, and flat, with the posterior tongue body somewhat backed. The point of contact for the velar stop is not visible for most speakers, and is presumably at the velum. The tongue 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

body is strongly convex. Overall, the constriction for /ʈ/ appears to be almost equidistant from those for /t/ and /k/.

As can be inferred from the tracings of the contact location and the palate, 9 out of 10 speakers seem to be making the retroflex constriction with the underside of the tongue (sub-apical). For speaker KF2, the constriction is apparently made with the tongue tip proper (apical). Examining the location of the constriction together with the palate contour, we can observe the contact is clearly in the palatal region for speakers KF3, KF5, KM2, and KM5; it is clearly limited to the post-alveolar region for KF2 and KM3; and it appears to be border-line, at the prepalatal arch (if can be seen) and spanning parts of both the post-alveolar and palatal regions for KF1, KF4, KM1, and KM4. (See Catford, 1977: 142–143 and Ladefoged & Maddieson, 1996: 11–15 on the phonetic segmentation of the roof of the mouth.) Thus the typical realization of the retroflex stop in our data is sub-apical palatal or sub-apical post-alveolar.

[Insert Figure 4]

[Insert Figure 5]

Figure 5 presents average maxima for the word /aʈʈa/ at 3 points in time – at the maximum constriction (as in Figure 4), and 10 frames before and after the maximum. It can be seen that for all speakers, the latter two frames are relatively similar, being characterized by a retracted posterior tongue body (somewhat more back, or [a]-like, for the last frame) and lowered anterior tongue body and the blade (somewhat lower for the last frame). In contrast to these frames, the posterior tongue body for the max frame is substantially fronted; the anterior tongue body is raised; and the blade is raised and retracted.

5.2 Tongue shape differences over time

15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

The results of the SS-ANOVA (see section 4.3.2) are summarized in Table 1, separately for the tongue blade (a), the anterior tongue body (b), and the posterior tongue body (c). The cells in the table provide numbers of speakers for whom particular differences (ʈ > t or t > ʈ) were significant. Shaded cells indicate dominant patterns (5 or more speakers). (Individual results are given in Supplemental Material 3.) It can be seen that the tongue blade (a) for retroflex /ʈ/ was significantly higher than for dental /t/ for all 10 at the max frame, and for 7 out of 10 speakers at frames -2 and 2. The earlier and later frames showed greater variability, with differences in both directions or no significant differences. In contrast, the anterior tongue body was always higher for retroflex /ʈ/ than for dental /t/, whenever there was a significant difference. The temporal span of these differences was also wider – commonly from frame -5 to frame 5. Finally, the posterior tongue body showed the reverse pattern: this region of the tongue was always higher and more back for dental /t/ than for retroflex /ʈ/, whenever the differences were significant. Notably, most speakers showed this difference at frame 2 than at the preceding frames. This is to be expected, if the tongue body fronting is part of the flapping-out movement of the tongue front, which is most prominent at the release of the retroflex.

[Insert Table 1]

5.3 The tongue displacement To compare the retroflex and dental articulations in terms of the magnitude of the tongue movement, its displacement was calculated as a mean absolute difference between the contour at the max frame and the contours at frames -10 (closing displacement) and 10 (opening displacement), both vertically and horizontally (see section 4.3.2). As Figure 6 shows, both vertical and horizontal displacement values were higher for retroflex /ʈ/ than dental /t/. These differences were found to be significant in a repeated measures ANOVA (Word: F(1, 8) = 46.121, p < .001 for Y and (F(1, 8) = 6.382, p < .05 for X). The magnitude of the difference, however, was considerably greater for the vertical dimension than for the horizontal dimension. The vertical displacement was also higher for the opening movement (away from the 16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

constriction) than for the closing movement (towards the constriction), however, the difference seemed to hold only for /aʈʈa/ only (Displacement Type: F(1, 8) = 82.916, p < .001; Word * Displacement Type interaction: F(1, 8) = 4.816, p = .059). This analysis was based on absolute displacement averaged over all points of the contour. Naturally, however, some parts of the tongue (especially the blade) moved more than others, as well as some movements were in the opposite directions (the backing for /ʈ/ and fronting /t/). On average, during the production of retroflexes, the frontmost visible point of the tongue (the blade) was raised by 15.5 mm and retracted by 5 mm. In contrast, the raising and fronting of the same point for the dental was 7.5 mm and less than 2.5 mm respectively.

[Insert Figure 6]

5.4 Summary

The results of the ultrasound experiment showed that the retroflex constriction can be categorized for most speakers as sub-apical post-alveolar or palatal. The consonant exhibited a characteristic convex shape of the front part of the tongue resulting from curling back of the tip/blade and flattening of the anterior tongue body. Most speakers showed some fronting of the posterior tongue body for the retroflex compared to the dental and the velar, as well as compared to the preceding and following inter-speech intervals. Spatial differences between /ʈ/ and /t/ were not limited to the max frame, but were often extended to at least one (for the tongue blade and the posterior tongue body) or two frames (for the anterior tongue body) before and after the constriction. This corresponds to an interval of up to 334 ms. The absolute displacement of the tongue was greater for the retroflex than for the dental, and particularly in the vertical dimension. The part of the tongue that showed the largest displacement was the blade (or the frontmost visible point of it). An obvious limitation of the ultrasound analysis presented in this section is the lack of precise information about the location of the tongue tip proper (beyond its retroflex constriction) and its movement over time. The results of the EMA experiment presented below help address this limitation. 17

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

6. Results: EMA

This section first presents an overview of the tongue tip trajectories over time and their statistical evaluation (6.1). This is followed by the gestural analysis of the tongue tip movement (6.2) and the results for spatial differences in other articulators (6.3).

6.2 Overview

Figure 7 plots average tongue tip trajectories for the entire /aʈʈa/ and /atta/ utterances, separately for each speaker. These trajectories are normalized by taking the onset of /akka/ movement as zero. Gestural landmarks – the onset and offset of the gesture and the constriction (see section 4.4.2) – are indicated on the retroflex trajectory by circles and triangles respectively. (See Supplemental Material 4 for separate X and Y trajectories over time.) It is clear from the plots that that for all the speakers, the tongue for the retroflex consonant moves on a wide curve in a clock-wise direction (the head facing to the right): up and back during the closing interval, further up in the first half of the closure, forward and down in the second part of the closure, and further forward and down through much of the opening interval. In contrast, the tongue tip for the dental moves along a very slightly curved clockwise trajectory (except KM1 who showed a counter-clockwise trajectory), with very little change in the position of the onset and offset of the constriction. While both consonants involve raising of the tongue tip, the magnitude of this movement is in general considerably higher for the retroflex. The two consonants exhibit antagonistic movement in the horizontal dimension: back for the retroflex and forward for the dental. The difference between dental and retroflex trajectories at the peak of this movement was on average 8 mm for females and 7 mm for males. For half of the speakers (KF2, KF3, KF4, KM1, KM2), the peak of the retroflex retraction occurred during the closing movement, prior to the consonant constriction. This is in contrast to the peak of raising, which for all speakers occurred during the constriction. Further, for all speakers, the onset of the constriction was more posterior than the constriction offset. All this indicates that the tongue for the retroflex consonant was retracted considerably before the closure, and this retraction could begin prior to the onset of the preceding vowel. During the closure, the tongue moved forward, and then continued (and 18

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

accelerated) its forward movement after the closure had been released. Speakers varied considerably in the extent of the forward movement of the tongue during the closure, which ranged from 1 mm to 5.5. mm. Finally, it is worth noting that for most speakers the tongue tip was more posterior and often higher even at the onset and offset of the /aʈʈa/ trajectory, compared to the baseline /akka/ position (zero). In a similar way, the tongue tip was more anterior at the onset and offset of the /atta/. This suggests that coarticulation to /ʈ/ and /t/ extended well into the inter-speech pause intervals, being both anticipatory and perseverative.

6.2.2 Tongue Tip trajectories over time

SS-ANOVA was used to confirm the observed differences between the X and Y trajectories in the retroflex and dental trials, separately by speaker and by 5 temporal intervals (see section 4.4.2; see Supplementary Material 5 for individual results). The results were very similar across the speakers. As shown in Table 2, the vertical trajectory for the retroflex was higher than for the dental during the consonant constriction (all speakers), the closing interval (all speakers), and the opening interval (8 speakers). In addition, 7 speakers showed a vertical difference during the pregesture interval, and 4 speakers of those speakers showed the same effect during the post-gesture interval. The horizontal trajectory for the retroflex was more back than for the dental during the entire gesture – the constriction and the closing/opening intervals (all speakers), as well as during the pre-gesture (9 speakers) and post-gesture intervals (8 speakers). These results thus show extensive coarticulatory differences, more so for the horizontal than the vertical tongue tip movement. The X-Y asymmetry is also reflected in the magnitude of the retroflex-dental differences, as shown in the last two rows of the table. Note also the greater magnitude of the difference (both X and Y) in the intervals before the constriction than after it – further emphasizing the dominance of anticipatory coarticulation.

19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

[Insert Figure 7]

[Insert Table 2]

6.3 The tongue tip gesture

Figure 8 (a) displays mean displacement values for the closing and opening intervals of /aʈʈa/ and /atta/ (see section 4.4.2). Both closing and opening displacement values were greater for retroflex /ʈ/ than dental /t/. These differences were confirmed in a 2-way repeated measures ANOVA with within-group factors Word and Interval (Word: F(1,9) = 31.398, p < .001). The retroflex displacement was also greater in the opening interval than in the closing interval (Interval; F(1,9) = 10.773, p < .01; Word * Interval interaction: F(1,9) = 14.001, p < .01). On average, the opening displacement was 12.6 mm for /aʈʈa/ and 9.3 for /atta/; the closing displacement was 15.5 mm for /aʈʈa/ and 9.4 mm for /atta/. Duration means for the closing, constriction, and opening intervals are shown in Figure 8 (b). These differences were evaluated using paired-sample t-tests. The retroflex consonant had a longer closing movement and a shorter constriction than the dental (t(1,9) = -4.178, p < .01; t(1,9) = 2.422, p < .05). The two gestures were similar in the duration of their opening intervals. For both consonant gestures, constriction intervals were longer than either the closing or opening intervals (p < .01-.05). The closing movement for the dental was shorter than its open movement (t(1,9) = -6.120, p < .01); the two intervals were not different for the retroflex. On average, the /ʈ/ gesture had a 125 ms closing interval, a 168 ms constriction, and a 121 ms opening interval; the same intervals for the /t/ gesture were 103 ms, 204 ms, and 118 ms respectively.

6.4 The jaw displacement The results of a Repeated Measures ANOVA for the Jaw vertical displacement (see section 4.4.2) revealed a significant effect of Word (F(2,9)8.350, p

Suggest Documents