Incomplete neutralization in voice assimilation and final devoicing in ...

4 downloads 0 Views 925KB Size Report
stops undergoing both final devoicing and voicing assimilation, and estimates the .... consonants, voiced stops /b, d, g/ are produced with closure that is ...
Incomplete neutralization in voice assimilation and final devoicing in Russian: Toward a model of gradient “cumulative” voicing

Vladimir Kulikov a * Bob McMurray b a

Department of English Literature and Linguistics, Qatar University, P.O. Box 2713, Doha,

Qatar b

Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA 52242

* Corresponding author Vladimir Kulikov Tel.: +974-4403-7583 Fax: +974-4403-4901 Email: [email protected]

2 Abstract

A critical issue at the interface of phonetics and phonology is whether rule-based phonological processes operate in a more gradient fashion. Phonological processes like voice assimilation and final devoicing typically result in transformation of the underlying voicing category and neutralization in acoustic cues. Previous research has found that speakers can leave traces of underlying voicing in some cues and that phonological assimilatory processes may be incomplete in phonetics. The paper investigates acoustic cues to voicing in Russian stops undergoing both final devoicing and voicing assimilation, and estimates the cumulative effects of integration across cues. We analyzed a corpus of 1117 stops produced by 22 native speakers of Russian in word-internal clusters and in clusters across a word boundary and measured six cues. We then combined information from the cues to compute cumulative voicing indices. We found incomplete neutralization in some cues (vowel duration and F1) and in continuous voicing indices. The results suggest that overall voicing state of assimilated obstruents can be a continuous measure rather than a discrete category. This measure reflects information of the underlying voicing category of assimilated or devoiced stops, suggesting that the outcome of phonological processes does not have to be completely neutralized in production.

Keywords: voice assimilation; final devoicing; cue; gradience; speaking rate; obstruents; Russian

3 1. Introduction A crucial issue in phonological theory is whether categories and phonological processes are gradient or discrete. In phonetics and speech perception the answer to this question over the last few decades has largely favored the notion that phonetic and perceptual processes are gradient (Gerrits & Schouten, 2004; Gow, 2001; Gow & McMurray, 2007; Hawkins, 2003; Lindblom, 1996; McMurray & Jongman, 2011; McMurray, Tanenhaus, & Aslin, 2002; Ohala, 1996; Pierrehumbert, 2003; Schouten, Gerrits, & Van Hessen, 2003), and to a lesser extend this is now common in production as well (Goldrick & Blumstein, 2006; Goldstein, Pouplier, Chen, Saltzman, & Byrd, 2007; Ellis & Hardcastle, 2002). Nonetheless the standard assumption in most phonological theory is discrete categories (Chomsky & Halle, 1968) and rule based processes (though see, Browman & Goldstein, 1989). This sets up a challenging set of questions concerning the interface between a discrete rule-based phonology, and a, gradient, variable and perhaps messy phonetics that must implement it (see Smolensky, Goldrick, & Mathis, 2014). Crucially, as phonological theory is increasingly motivated by perceptual and articulatory considerations (Browman & Goldstein, 1989; Beckman, Helgason, McMurray, & Ringen, 2011; Goldrick, 2011; McMurray, Cole, & Munson, 2011), it becomes more important to accurately describe the continuous implementation of phonological processes. It would be incorrect to say that phonological theory does in general not explain or account for variation (c.f., Smolensky, et al., 2014). However, in many of the phenomena in which this has been investigated, the relevant phonological theory still assumes a fundamentally symbolic process. For example, cases of optional phonological alternations, such as t/d-deletion in English (Guy, 1980) or t-deletion in Finnish (Keyser & Kiparsky, 1984) are well-documented. However, crucially in such phenomena the variation is in the application of the phonological process (e.g., whether or not it occurs, or its frequency in various triggering contexts); the result of such a process (when it occurs) is often assumed to be categorical and discrete (Anttila, 1997), the variation derives from whether or not it occurs for any given string. In other words, speakers do not always delete a segment, but when they do so, no trace of this segment is found in the outcome. Thus, such phenomena largely treat the phonological process as discrete, even if it is variably applied. Even the most recent phonological theories (e.g. Optimality Theory, OT) typically treat variation only as variability between the outcome forms rather than variability within an outcome form. Given the contrast between the clear evidence for gradiency in speech perception / production, and standard assumptions of discreteness in phonology, it would be easy to form

4 a simply dichotomy between categorical phonology and a gradient phonetic implementation. However, the complex relationship between phonology and phonetics may not be this limited. Cohn (2007) argues that phonology and phonetics interact in two distinct ways. Phonology regulates phonetics (phonology in phonetics in Cohn’s terms) and phonetics, in turn, constrains phonology (phonetics in phonology). Although both phonology and phonetics are distinct, this distinction may not be as sharp as is usually assumed in modular theories. Both domains show evidence for gradience and categorical distinction, and languages which exhibit gradient phonology and categorical phonetics in languages are as natural as languages which exhibit categorical phonology and gradient phonetics (Cohn, 2007). The examples of the former (gradient phonology / categorical phonetics) include allophonic or positional variation within a phoneme, when variants of the same phoneme can be represented by phonetically distinct segments (e.g. flapping in American English), but the selection of the particular form follows a more probabilistic form. The latter (gradient phonetics/ categorical phonology) is illustrated by differences between (phonetic) coarticulation and (phonological) assimilation, the topic of the present paper. Here, there is some evidence that processes like place assimilation are gradient (Ellis & Hardcastle, 2002; Gow, 2001), but these cannot be simply demoted to a simple form of coarticulation (phonetics) for several reasons. First, implementation of phonological assimilation (place or voice) can be language specific. The phonetic outcome of voice assimilation of stops in Dutch (Slis, 1986) is not identical to the outcomes of voice assimilation in French (Snoeren, Hallé, & Segui, 2006) or Hungarian (Gow & Im, 2004). Even coarticulation may be language specific as well. Beddor, Harnsberger and Lindemann (2002) argue that languages differ in application of vowel-to-vowel coarticulation. Zsiga (2000) claims that consonantal overlap in clusters is different in English and Russian. Moreover, Cohn (2007) argues that processes like nasalization or vowel rounding can be both phonetic and phonological. The actual difference between coarticulation and assimilation is, thus, a matter of degree. Therefore, the question to ask may be not whether phonology is categorical or gradient, but rather, how phonetic and phonological gradience interact in phonological processes. An illustrative example of these issues is voicing in Russian stops. Russian has a contrast between (pre)voiced and short-lag voiceless unaspirated stops, which is maintained in word-initial and in word-medial (intervocalic) position. If we consider single intervocalic consonants, voiced stops /b, d, g/ are produced with closure that is predominantly voiced and voiceless stops /p, t, k/ have a closure that is for the most part, voiceless. However, the

5 phonetic implementation of voicing, allows for some gradience: though the duration of voicing in Russian averages 98% of the closure, it occurs in a range between 55% and 100% (Ringen & Kulikov, 2012). Thus, at a purely phonetic level, there does appear to be an opportunity for some gradience in the manifestation of voicing, it remains an open question whether the continuous aspects of voicing duration is lawfully related to other factors, including phonological regularities. In this regard, stops in Russian undergo a form of regressive voice assimilation. In a phrase like /kot belyj/ (white cat), the /t/ in /kot/ assimilates to /d/ in front of the [b] in /belyj/, resulting in a production of [kod belyj]. This change is often claimed to be the result of a categorical phonological processes (e.g. Cho, 1990): speakers always produce fully voiced segments [b, d, g] before voiced obstruents, and they always produce fully voiceless segments [p, t, k] before voiceless obstruents no matter what the underlying segment is. However, if even unassimilated voicing is shows gradient or continuous properties, the assimilation may exert a more continuous effect on it. If so, we may need a way to more precisely characterize phonological alternations in a more continuous fashion. There is consistent evidence that gradience can be found in the aftermath of assimilation processes. That is, assimilation can leave a trace of the underlying category on the modified segment, as seen in work on assimilation phenomena in a variety of languages. For example, in Polish underlyingly voiced word-final stops are devoiced before a vowel or a voiceless stop. However, Slowiaczek and Dinnsen (1985) argue that this devoicing also leaves traces of the underlying voicing in some cues. They found that duration of a preceding vowel and closure duration in such stops is consistent with their underlying voicing category (i.e. voiced) rather than with their surface, phonetic category (i.e. voiceless). A similar trend is observed for other assimilatory processes. For example, Zsiga (1995) found that when /s/ is palatalized before a /j/ (e.g., in /ðɪs jir/, which assimilates to /ðɪʃ jir/), the resulting /ʃ/ shows frequency centroid values and contact patterns that fall between typical values for /s/ and /ʃ/. Thus, the articulatory and acoustic form shows revealing traces of the underlying /s/. Similarly work on coronal place assimilation in English (e.g., green boat  greem boat) suggests that assimilated consonants often have gestural properties of both the underlying segment and the conditioning context (Barry, 1985; Kerswill, 1985; Nolan, 1992; Byrd, 1996; Kuhnert & Hoole, 2004). Results like these present a challenge for phonological theory. The presence of traces of underlying categories in assimilated segments suggests that the result of a phonological

6 process is gradient and incomplete rather than categorical. Phonological assimilatory processes may affect only some cues leaving others intact or affecting them to a lesser degree. Consequently, assimilated segments may have hybrid acoustic and articulatory properties consistent with their underlying and phonetic categories. This challenges strictly rule-based or discrete accounts of phonological processes. Moreover, these partial articulations may be beneficial to listeners. Listeners are highly sensitive to the acoustic consequences of partial assimilation and can use this to both anticipate upcoming sounds and resolve the ambiguity created by assimilation (Gow, 2001, 2003; Gow & McMurray, 2007). Thus, these gradient—and language specific--modifications are a functionally useful part of the language. Moreover, such gradiency may derive from lawful properties of the production system. For example, Goldrick and Blumstein (2006) suggest that when the production system is partially considering multiple phonemes, intermediate acoustic forms can result, with different cues playing a different role in communicating both underlying and surface forms (Goldrick & Blumstein, 2006). Should one really exclude such lawful behavior from the phonology solely because it does not fit into the dominant, discrete framework (see also, Beckman, et al., 2011)? As part of a broader investigation of a more gradient phonology, it is important to precisely characterize the conditions under which gradience is observed in assimilation. For example, two of the aforementioned studies showing a gradient pattern of assimilation only investigated assimilation across a word boundary (Slowiaczek & Dinnsen, 1985; Zsiga, 1995), where prosodic structure could affect assimilatory processes. The consonants across a word boundary belong to two different words and it is unlikely that speakers have motor programs for each possible combination of words. As Slis (1986, p. 312) argues, “clusters across word boundaries are new combinations and have therefore no engrained assimilations”. Hence, less coherence may be expected in assimilated stops across a word boundary. Consequently, the presence of incomplete assimilation within a prosodic word would constitute much stronger evidence. Thus, it is crucial to investigate phonological assimilatory processes not only across a word boundary but also in word-internal clusters. We are not aware of detailed phonetic studies of voice assimilation in word-internal clusters. To our knowledge all previous studies of regressive voicing assimilation (e.g. Slis, 1986; Warner, Jongman, Sereno, & Kemps, 2004, for Dutch; Snoeren, et al., 2006, for French; Gow & Im, 2004, for Hungarian)

7 examined assimilation across a word boundary1. As we argue in the next section, Russian has several properties that make it ideal for such an investigation.

1.1. Voicing in Russian For the present purposes, Russian has two relevant processes: regressive voice assimilation and final devoicing. It also has a rich morphology that allows concatenation of morphemes within a word, enabling us to investigate assimilations both within- and acrossword-boundaries. Here, we briefly describe both processes to motivate our study. Voice assimilation in Russian occurs in clusters across a word boundary, as well as in word-internal clusters (Avanesov, 1968). A leftmost stop in such clusters assimilates (voices or devoices) to the voicing of a rightmost obstruent, and this can occur for both voiced and voiceless target sounds (voiced sounds can assimilate to voiceless and vice versa): (1)

sa/dk/a

 sa[tk]a ‘cage’ Gen.sg.

c.f. sa/d/ok

molo/tb/a  molo[db]a ‘threshing’

 sa[d]ok ‘cage’ Nom.sg.

molo/t/itj  molo[t]itj ‘to thresh’ Inf.

Similarly, roots can concatenate with lexical items that begin with a voiced or voiceless obstruent, resulting in voice assimilation of a word-final stop on (at) a phrase level: (2)

sa/d p/osadili  [tp] ‘the orchard is planted’ ko/t b/elyj

 [db] ‘the cat is white’

Voicing assimilation in Russian was studied by Burton and Robblee (1997), who found strong evidence of incomplete voice assimilation in obstruent clusters. Underlying voiced and voiceless stops and fricatives, with the same post-assimilation voicing, differed in the duration of closure and voicing, and also in the amplitude of voicing. These differences were stronger in fricatives than in stops. Although Burton and Robblee investigated a typical case of voice assimilation in Russian, they looked at obstruent clusters only at a clitic boundary (between a proposition and a content word), which is strictly speaking not a wordinternal position.2 Thus, it remains an open question whether incomplete assimilation is seen within a word (where it is more likely to be complete).

1

Slis (1986) looked into voice assimilation in compounds, which might not be a clear case of word-internal clusters. Jessen and Ringen (2002) argue that compounds in Germanic languages act as phrases in phonological processes. 2 Russian prepositions are proclitics that do not constitute a separate Prosodic Word (Selkirk, 1995) but a boundary between a preposition and a following word still separates two lexical domains and often has the same effect in Russian phonology as a word boundary. Russian (Surface) Palatalization before a front vowel /i/ is obligatorily word-internally, but it does not occur across a word boundary or across a clitic boundary (see Rubach, 2000; Gribanova, 2008, for details).

8 Russian also has final devoicing. Voiced obstruents are systematically produced without vocal fold vibration at the end of words: (3)

sa/d/

 sa[t] “orchard” Nom.sg.

goro/d/  goro[t] “city” Nom.sg.

c.f. sa/d/a

 sa[d]a “orchard” Gen.sg.

goro/d/a  goro[d]a “city” Gen.sg.

It has been reported in several studies that final devoicing is incomplete in Russian just as in Polish (Slowiaczek & Dinnsen, 1985). Pye (1986) reports that the closures duration in underlying voiced stops is shorter than in underlying voiceless stops, and it is affected by place of articulation. The greatest difference (15%) was observed in bilabial stops, but in coronal stops it was almost completely neutralized (2%). Shrager (2006) finds significant difference in the energy of burst between underlying voiced and voiceless coronal stops. Dmitrieva, Jongman, and Sereno (2010) report that monolingual speakers of Russian distinguish final obstruents in closure duration, as well as in burst duration. Kharlamov (2014) reports that underlying voiced final obstruents in Russian are produced with longer duration of glottal pulses but he claims that incomplete final devoicing is more likely to occur in minimal pairs and in cases when speakers are exposed to orthographic representations. Matsui (2011) argues that speakers of Russian can use these phonetic differences to recover underlying forms in discrimination and identification tasks. Others (e.g. Barry, 1988), in contrast, argue that the final voicing contrast in Russian is largely neutralized as differences in closure duration, duration of voicing into closure, and duration of a preceding vowel between underlying voiced and voiceless stops do not reach significance level. It is intriguing that incomplete neutralization in all of these studies was found in very different cues. But no study investigated all of the important cues to voicing and changes in voicing as an integral category. Some authors even claim that since incomplete assimilation is never found in all cues, this constitutes evidence that voice assimilation or final devoicing is complete and categorical (e.g. Keating, 1985; Jassem & Richter, 1989). However, there is a more compelling alternative explanation for the same pattern: talkers never completely assimilate or devoice obstruents and always leave at least some traces of underlying voicing, perhaps as a benefit to the listeners, or perhaps as the byproduct of an articulatory or production process. Either way, however, these sorts of claims have not been investigated with a full complement of cues for voicing. It has been proposed that voicing is not defined by just one cue and is an articulatory state rather than one gesture (Westbury & Keating, 1986). For word-initial stops, the cues to voicing include VOT (Lisker & Abramson, 1964), F1 (Liberman, Delattre, & Cooper, 1958;

9 Summerfield & Haggard, 1977), f0 (Haggard, Ambler, & Callow, 1970), duration of a following vowel (Allen & Miller, 1999; Toscano & McMurray, 2012), among many other potential cues. For intervocalic stops, additional cues include voicing during closure (Lisker & Abramson, 1967) and duration of a preceding vowel (Chen, 1970), and F1 and f0 are also informative at both closure and release (see Lisker, 1986). Some of these may be fundamentally related to the articulatory process (laryngeal voicing) while others may serve as enhancing cues (e.g., Stevens & Keyser, 2010). However, as a whole this large set of cues constitutes the information regarding voicing that the speaker is attempting to convey to the listener. In some studies, effects dissociate with underlying and surface features communicated differently via channels like VOT and vowel length (Goldrick & Blumstein, 2006); however, it is equally important to establish the total amount of information carried across cues. If voice assimilation or final devoicing is in fact incomplete and gradient, as suggested by previous studies, it is crucial to establish two things: (1) what cues show incomplete assimilation and (2) what is the overall voicing state of assimilated stops. Despite obvious necessity of accounting for the combination of multiple cues to voicing, previous studies usually focused on one or several cues but never on integral voicing state across cues. While prior approaches offer a good understanding of how individual cues function, a more integral estimate of voicing becomes crucial if the outcome of phonological processes is gradient. That is to the extent that a phonology (even a gradient phonology) must abstract across specific acoustic cues, we must determine how to quantify voicing (or any other articulatory state) more generally. Virtually little (if any) research has addressed the overall degree of voicing outcome in phonological processes. The present study offers an approach that attempts to integrate over cues to voicing. Thus, in addition to examining each cue individually, we also integrate across them, using a recently developed computational method.

1.2 Summary and Logic The present study had the following goals. First, we evaluated the effect of assimilation and devoicing of Russian obstruents in a number of individual cues. Second, we examined the voicing state as an integrated product of many acoustic cues (sensitive, of course, to their differential weighting for the voicing category). Third, we compared the results of voice assimilation in clusters within a word to clusters across a word boundary in order to determine if incomplete assimilation is an aftermath of prosodic structure or it is a natural and expected component of phonological processes. Finally, we tested the effects of speaking

10 rate on acoustic cues for voicing and on integral voicing state in assimilated and devoiced obstruents. To achieve these goals, we performed two experiments: one examined word-internal obstruents in clusters and word-finally; the second investigated changes in voicing in obstruent clusters across a word boundary. In both experiments, we measured and analyzed primary and secondary acoustic cues to voicing separately and then calculated a weighted metric for each voicing category. In addition, all tested items were produced in three speaking rate conditions: list, slow, and fast. This allowed us to analyze results of voice assimilation and final devoicing as a function of speaking rate. To be clear, while we have framed this study in terms of the broader issue of gradiency and discreteness in phonology and of the relationship of phonology to phonetics, no one study or phenomenon can fully address these issues. Our study is meant as a case study and perhaps as a stepping stone toward building an approach to these issues more generally.

2. Experiment 1: Voice assimilation and final devoicing across a word boundary

2.1. Method 2.1.1. Participants Eight native speakers of Russian (four males) participated. They were monolingual speakers who had grown up and resided in Tambov3 (central Russia). Their ages ranged from 21 to 48 (mean age=37.3). They all were speakers of educated Standard Russian4 and had no history of speech or hearing disorders. The subjects were paid a standard hourly rate for their participation. 2.1.2. Items Items consisted of two word sequences: a minimal pair on final-voicing (the target), followed by a word that was expected to create assimilation, devoicing or neither (the context). The

3

We did not use Russian speakers residing in the United States to avoid possible interference from English, a language with a different voicing assimilation pattern and no final devoicing (Keating, 1984; Cho, 1990). Time spent in a different country can trigger changes in the voicing properties of sounds in a native language. For example, Dmitrieva et al (2010) have shown that after staying in the USA for several months, speakers of Russian begin to pronounce voiced and voiceless sounds in their native language in a way that is more similar to English sounds. 4

The screening procedure included a short interview to find out whether the participants produce a voiced velar obstruent as a stop [g] (Standard Russian) or as a fricative [ɣ] (Southern Russian dialects). All of the speakers who participated in the experiment produced a voiced velar obstruent as a stop.

11 list of target items included minimal pairs with final voiced and voiceless stops at three places of articulation (e.g. lug ‘meadow’ – luk ‘onion’). The target words were followed by one of three types of context words: 1) a word beginning with a vowel ( __# V, where final devoicing is expected), 2) a word beginning with a voiceless ( __# [vl]) or 3) a voiced obstruent ( __# [vd]), environments where voice assimilation is possible (depending on the target word). For vowel-initial context words, the vowel was always /o/ (and there were 6 such words). For the stop initial words, the consonant was labial (6 words), coronal (5 words), or velar (2 words). Heterorganic stop clusters were used to reduce the number of unreleased stops (Zsiga, 2000). The target and context words matched semantically. The list of items is given in Appendix. In addition to the 18 phrases (3 target places x 2 target voicing x 3 context words), 22 filler phrases were used, which were associated semantically with the tested collocations, but had various obstruent-sonorant clusters across a word boundary. Items were read in two speaking rate conditions: slow and fast. We examined speaking rate in order to make sure our effects held at multiple rates and to investigate whether one category or the other might be more affected by rate (following Beckman, et al., 2011). In the slow condition, the target phrases were pronounced at a comfortable tempo in a carrier phrase Skaži ____ ješče raz (‘Say ____ once again’). The speakers did not have any particular instructions about pauses between words. In the fast condition, the target phrases were pronounced in the same carrier phrase quickly. The speakers were asked to repeat the phrase if they paused after a target word. For each condition, the speakers read the list three times but only the second and third readings were recorded. 72 target phrases (18 phrases x 2 rate conditions x 2 repetitions) for each speaker were recorded. We discarded 19 word-final tokens (across all of the talkers and repetition) due to the absence of audible and visible (on a spectrogram) release. These tokens were evenly distributed among all speakers; the only exception was Speaker 4, who pronounced eight unreleased stops. Therefore, a total of 557 target stops were selected for the analysis.

12 2.1.3. Recording and Measurement Talkers were digitally recorded in a quiet room using a one-point condenser SONY ECMMS907 microphone and an Echo Indigo IO soundcard at 22,050 Hz. The digitized segments were manually marked for boundaries in PRAAT (Boersma & Weenink, 2011). Both the waveform and the spectrogram were used to set the boundaries of a stop and a preceding vowel. Following Jessen (1998), the beginning of the stop closure was marked at the end of the second formant, which typically C1

coincides with a significant drop in amplitude of vocal fold vibration. The

C2

stop closure voicing

end of the closure was marked at the beginning of the release burst. In cases

burst

vowel

where the first stop in a cluster had a weak release, the difference in amplitude between two voiced stops was used to determine the boundary, where possible. Figure 1 exemplifies a case of juxtaposition of an underlyingly

5 kHz

voiced stop across a word boundary before a voiceless stop. In order to

2.5 kHz

define voicing in stops, we measured 1) the duration of glottal pulses into stop closure (voicing duration), 2) the

0 Hz

k

o

d[t]

p

?

duration of stop closure, 3) the duration of the burst, and 4) the duration of the preceding vowel.

Figure 1. C1 stop closure, devoiced; C2 stop, voiceless (from a token kod podobran, ‘the code is found’, spoken by S1 (f), fast rate.

2.2. Results The goal of the analyses was to determine the effects of both underlying voicing and assimilation in stop-stop clusters across a word boundary at different speaking rates. If the first stop in the cluster (C1) is assimilated in voicing, the voicing properties of this segment should be consistent with the voicing properties of the second stop in the cluster (C2). Conversely, over and above this effect, any evidence of the underlying voicing of C1 on the C1 phonetic cues would imply that some information regarding underlying voicing is preserved despite the assimilation, a more gradient or partial form of assimilation.

13 We also examined final devoicing by examining tokens that preceded a word-initial vowel. Here, absence of voicing during closure in C1 stops was interpreted as final devoicing, but any evidence of an effect of underlying voicing would be consistent with a more gradient or partial form of devoicing. Before conducting our primary analysis, we conducted two preliminary analyses to validate our experimental design. First, we examined the effect of the speech rate on vowel and closure duration to determine whether the speaking rate manipulation had the intended effect. Next, we examined the voicing properties of the context consonants (C2) to establish whether the segments that condition voice assimilation are stable. Both of these analyses confirmed the expected effects and are described in the online supplement. Our two primary analyses examined voicing in the initial consonant (C1). The first analysis assessed the effect of assimilatory context on C1 in the different speaking rate conditions. The second analysis examined word-final devoicing. In each of the two analyses we conducted individual ANOVAs on each cue. We start each by describing the analysis of a representative cue (usually the “dominant” cue, though we make no specific claims about that) and then summarize all of the individual analyses. An important goal of this project is to develop ways to generate inferences about voicing as a whole across multiple cues. This was done by developing the computational model of speech production into an analytic tool (see McMurray and Jongman, 2011; Cole, Linebaugh, Munson, & McMurray, 2010; Apfelbaum, Rhone, & McMurray, submitted, for complete details). This is a fairly data intensive approach, and it required data from both experiments; thus, our synthesis is presented in a separate section after Experiment 2.

2.2.1. Voice assimilation before obstruents To assess conditions of voice assimilation, we examined the voicing of the target stop in the assimilating contexts including voiced and voiceless obstruents. We start by fully describing the analysis of voice duration, and next we summarize identical analyses conducted on each of three additional cues that were measured. A repeated measures ANOVA was performed on voice duration (in ms). Our primary factors of interest were context voicing (voiceless/voiced, the effect of assimilation) and underlying voicing (voiceless/voiced, evidence for partial assimilation). We were also interested in how speaking rate (slow, fast) may moderate these effects. Finally, the place of articulation of C1 served as an additional factor that was manipulated in the list of items; while we did not have any a priori to expect that assimilation would work differently at

14 different places of articulation, it is a known source of variance in voicing (e.g., Lisker & Abramson, 1964) and we wanted to account for it in our statistical analysis. The results of this ANOVA are summarized in Table 1. Of the main effects, only context voicing affected voice duration (p < .0001). This was due to a strong effect of voice assimilation – with longer voice durations before a voiced C2, and shorter voicing durations before a voiceless C2. There was an interaction of the underlying voicing with context voicing. This was due to the fact, that while there was no effect of underlying voicing when the context was voiceless (F(1,7) = 2.06, p = .201), we did find a small but significant effect of underlying voicing when the context was voiced (Mvoiced = 48 ms; Mvoiceless = 52 ms; F(1,7) = 6.70, p < .05). This might be seen to support the idea that some partial evidence of the underlying voicing was available before voiced segments. However, the effect of underlying voicing was not in the predicted direction with more voice duration when the underlying segment was voiceless, suggesting it may have roots elsewhere. Surprisingly, our speaking rate manipulation had little effect on voice duration with no significant main effect or interactions. There was an interaction of rate with place (p < .001), with dentals showing longer voicing durations in slow speech (T(7) = 3.55, p < .01) and bilabials showing longer durations in fast speech (T(7) = 2.40, p < .05). The overall lack of an effect on voicing duration is somewhat surprising and suggests that duration of voicing is somewhat invariant with respect to rate. However, the fact that closure duration and vowel duration are affected by rate (as described below) suggests that there may be more tokens with incompletely voiced closure in slow speech than in fast speech. To test this prediction, we computed the voicing ratio as the ratio of voicing duration and closure duration. This measure is widely argued to be more invariant to speaking rate (though not perhaps here) and a better predictor of listener behavior (Port & Dalby, 1982). In this context, it addresses the issue of how “complete” closure voicing is (which is difficult to do with voicing duration alone). We applied the same repeated measure ANOVA to VR as the prior analysis (Table 1). As before, there was no effect of underlying voicing on VR but a strong effect of context (p < .001). Again there was an interaction of context with underlying voicing (p < .001), but again this was due to a significant effect of underlying voicing in the voiced context (VRvoiced = 84%, VRvoiceless = 90%; F(1,7) = 6.84, p < .05) but in the wrong direction. The effect of underlying voicing before voiceless C2 was in the correct direction, but only marginally significant (VRvoiced = 30%, VRvoiceless = 25%; F(1,7) = 4.78, p < .065).

15 We did observe a more robust effect of speaking rate on this cue (p < .001) suggesting that in slow, more careful speech listeners were less likely to fully voice the closure – this may imply that the laryngeal gesture is not timed to the closure, but rather has its own intrinsic timing. Rate did not interact with underlying voicing, however, suggesting that the lack of information about underlying voicing was the similar at both rates. However, there was a marginally significant interaction between rate and context voicing (p = .062). This was due to the fact that the overall rate effect was driven by the stops preceding voiced stops (Mslow = 76%, Mfast = 97%; F(1,7) = 12.01, p < .05); there was no effect of rate before voiceless stops Mslow = 25%, Mfast = 28%, F(1,7) = 2.63, p = .149).

Table 1. Summary of the statistical test for secondary cues for voicing in C1 stops before obstruents. F values are shown; significant values are given in bold (^ p < 0.1, * p < .05, ** p < .01, *** p < .001). Only significant three- and four-way interactions are shown.