Graded Constraints in English Phonology

3 downloads 19093 Views 302KB Size Report
/Vks/ and /Vps/ are more common than /Vts/. ... 1.23 ns ; Vps/Vts t(4) = -.78 ns. ...... measure bang. N yank j mad m loch x nat n had h lad l why w rat r cheap. tS fat.
Graded Constraints in English Phonology Brent Vander Wyk Department of Psychology and Center for the Neural Basis of Cognition Carnegie Mellon University

Proposal for a dissertation to be submitted in partial fulfillment of the requires for the degree of Doctor of Philosophy in Psychology

Committee: James L. McClelland (chair) David C. Plaut Lori L. Holt Sheila E. Blumstein (Brown University)

Introduction The goal of phonological theory is to provide a description of the phonological structure of languages. Typically, this problem has been approached with a bias toward binary distinctions. The theorist is interested in being able to describe which forms are allowed in the language and which forms are not, and the theoretical constructs invoked to make this description are limited to binary decisions of legality. However, the following analyses and proposed research instead focuses on graded structure and the constraints that may give rise to it. Limitations on human memory or the ability to produce distinguishable sounds may both motivate the reuse of phonological material. However, if the language made no further distinctions beyond legality, one would expect that the pattern of reuse would be uniform. However, this is not the case. Instead, there is gradation in how commonly forms are reused, with some forms being very common, other forms being uncommon, and still others not occurring at all. The hypothesis being pursued here is that certain factors serve as constraints on phonological forms. Violations of these constraints accumulate to influence both whether the form exists and also how common the form might be. Ultimately the sources of phonological constraint will need a formal definition. We can speculate that they may arise from articulatory, perceptual, and communicative considerations (Lindblom, MacNeilage, & Studdert-Kennedy, 1984; Redford, Chen, & Miikkulainen, 2001) with the best forms being the ones reused the most (Landauer & Streeter, 1973). This is beyond the scope of the present work. For now, progress can be made by identifying the factors and describing how they combine to structure the gradedness of the language without having to give an account of their source. We begin with an informal inspection of a subset of the data. Table 1 shows the average count per vowel for selected English monomorphemic monosyllablic rhymes. Only those rhymes that include an oral stop in the coda are displayed, a limitation that is discussed below. The major divisions in the table are between rhymes with a long or short vowel, and forms which contain an unvoiced consonant in the coda and those that do not (coda voicing). Also shown are various common embellishments such as the addition of a nasal or liquid. There are several patterns which are apparent in this data. The first observation is the fact that there is massive variability in the commonness of the rhymes. The rhymes listed in the table range from 19.2 to 0.2 words per vowel. This simple observation underscores how insufficient binary distinctions alone are in accounting for the variability in the data. Second, this variability is structured, albeit noisily so. Clearly, rhymes with embellished codas are much less common than the relatively simpler unembellished ones. And coda voicing, and vowel length also have an effect on how common the rhymes are; voiced codas and long vowels tend to reduce the rhyme’s commonness. There is also a bias toward coronal gestures. This pattern suggests that all else being short vowels are preferred over long ones, codas with unvoiced consonants are preferred over voiced ones, coronals are preferred over non-coronals, and finally unembellished codas are preferred over embellished ones. These preferences account for why /Vt/ is more common than /VVlt/, since the latter has a long vowel, and a more complicated coda. Third, among the factors that give this gradedness structure no single parameter is clearly dominant. Considering the factors separately, both the voiced coda rhymes and long vowel rhymes

1

seem to be reduced compared to unvoiced short vowel rhymes. Taken together the voiced long vowel rhymes are even more reduced. This suggests that the constraints accumulate gradedly, with each violation adding in its own penalty thereby reducing the commonness of the overall form. These observations lead to the conclusion that there is significant graded structure in the commonness of rhymes that can be described by the accumulation of violations of a handful of constraints. An important concept is that pairs of forms can be related to one another in terms of these constraints. The rhyme /Vt/ (where V stands for ‘any short vowel’, and VV for ‘any long vowel’) can be related to /Vd/, /Vk/, or /VVt/ by a violation of voicing, coronality, or vowel length constraints respectively. According to the observations above, the means that /Vt/ should be more common than these related forms. This relationship can formalized generally: (1) Given phonological constraints C1..N operating over F1..M, if Fj = Fi + Cx then: (a) Fi  Fj (Existence Implication) (b) Fi ≥ Fj (Commonness Implication) (c) !Fi  !Fj (Non-existence Implication) Where ‘F1 = F2 + C’ means that form F1 is formed by applying a change to F2 that violates constraint C, ‘X  Y’ means the existence of Y entails the existence of X, ‘!X  !Y’ means the non-existence of X entails the non-existence of Y, and finally ‘X ≥ Y’ means that X is more or equally common as Y. Just as /Vt/ can be related to /Vd/ with a constraint violation, /Vd/ can be related to /VVd/ by a violation. In fact, successive rhymes can be related to one another: /Vt/ to /Vd/, /Vd/ to /VVd/, /VVd/ to /VVg/, and so on. Each step is a an application of (1), and therefore carries the set of entailments. Which means that (1) can be used to structure chains of related forms:

2) Given phonological constraints C1..N operating over F1..M, such that Fi+1 = Fi + Cx (a) Fi  Fi+1  Fi+2  Fi+3  … (b) Fi ≥ Fi+1 ≥ Fi+2 ≥ Fi+3 ≥ … (c) !Fi  !Fi+1  !Fi+2  !Fi+3  … Each additional constraint added to form Fi results in new forms Fi+1, Fi+2, Fi+3, and so on. Point (2a) states that the existence of a form with more constraints implies the existence of all related forms with fewer constraints. Point (2b) states that a form can be at most as common as all the related forms with fewer violations. And point (2c) states that the non-existence of a form implies the non-existence of all related forms with more violations. In this way the simple graded accumulation of constraint violations allows us to make powerful assertions about the existence and relative commonness of related forms. The kind of frequency sensitive structure described in (2) has not been of primary interest in linguistics despite probabilistic information having been shown to effect a variety of linguistic and non-linguistic tasks such as goodness judgments (Coleman & Pierrehumbert, 1997; Frisch, Broe, & Pierrehumbert, 2004; Frisch, Large, & Pisoni,

2

2000), nonword repetition (Vitevitch & Luce, 1998), speech errors (Goldrick, 2004), phoneme identification (Pitt & McQueen, 1998), and recognition memory (Frisch et al., 2000). Even in work where preferences were discussed, they are typically only of secondary importance and are left unintegrated with systems of formal rules or constraints which serve to explicitly rule out illegal forms. So the power of this simple graded accumulation of constraint violations has been overlooked. For example, Harris (1994), describes a set of a little more than a dozen rules which elegantly manage to rule out virtually all unattested rhymes in English monomorphemic monosyllables without ruling out any of the attested ones. One highly specific rule is “for rhymes of the form /VVlX/, where X is a stop, X must be a coronal.” This rule eliminates the form /VVlg/ and several others. However, the form /VVlg/ also violates many constraints since it has a long vowel, and its coda is embellished, voiced and non-coronal. Since each violation reduces a form’s commonness, /VVlg/ might be able to be eliminated without recourse to a rule, but instead by preferences alone. The data used above, and indeed in the analyses that follow, is limited to the rhymes of monomorphemic monosyllables. Taking this as the unit of analysis satisfied several practical concerns. First, there are distributional differences in between multiand monosyllables and between multi-and monomorphemic forms. Syllables embedded in multi-syllabic words tend to be simpler and may interact with prosodic context (Hammond, 1999). Moreover, determining the syllabification of multisyllabic forms is not trivial. Multimorphemic syllables can also form phonological constructions not seen in monomorphemic forms. For instance, although the allophonic variation of the regular past tense marker is dependent on properties of the previous phoneme, certain postvocalic combinations are allowed in multimorphemic syllables that are illegal monomorphemic rhymes, i.e. the rhyme in ‘beeped’, /VVpt/ (Burzio, 2000). Whether these distinctions will be necessary in the long run is an open question. But as there was good a priori reasons for suspecting that these forms differed, only monomorphemic monosyllables will be examined. Furthermore, only the rhyme portion of the syllable has been analyzed so far. Classical phonological theory proposes a syllabic partitioning based on an onset and a vowel-coda rhyme (Fudge, 1969) motivated partly from the fact that legal syllables can be built out of, more or less, any combination of legal onsets and rhymes. Whereas, the legality of the rhyme or onset is dependent on more than just the independent legality of their constituents, but also the way in which those constituents are combined. One of the main points of this paper is that the relying exclusively on the binary notion of legality misses a great deal of the graded dependencies. Therefore, we should be wary of adopting any distinction predicated on this binary assumption. In fact, some previous analyses have found dependencies between onsets and codas (Diver, 1979). This may mean that further investigations of the graded dependencies will be necessary for a complete account of a whole syllable or even an account of the rhyme or onset. However, at the whole syllable level forms are not reused in the way they are at the level of onsets or rhymes. So investigations of gradedness would require a different kind of methodology. Moreover, the dependencies were relatively weak and a substantial body of work supports the validity of the onset-rhyme syllable structure (Frisch et al., 2000; Pierrehumbert & Nair, 1995; Treiman, 1988; Treiman & Kessler, 1995, 1997; Treiman, Kessler, Knewasser, Tincoff, & Bowman, 2000). Rhymes are slightly more interesting

3

candidates for analysis than onsets for the simple reason that they involve a greater variety of potential constraints because they include both a consonant cluster and a vowel. For this reason the first analyses were all performed on the rhyme portion of the syllable. Consequently, the proposed empirical follow-ups also focus on the properties of rhymes and the constraints that operate over them. For the sake of completeness a set of analyses, analogous to the ones about to be described, will be performed on onsets for the final dissertation.

Previous Work General Methods The following analyses were based on a set of monosyllabic monomorphemic words drawn from the spoken British English CELEX database. The lemma phonology subset in CELEX has most of the inflected forms removed, and this set was further filtered to remove the multisyllabic forms, as well as other potentially morphologically complex forms (e.g. strength and length, because of their relationship to strong and long). This leaves a set of 3474 words. In this investigation the chief unit of analysis was the ‘rhyme type’. As described above, rhyme type is comprised of categorical information about the vowel and the consonantal coda. Previous work has acknowledged the importance of vowel length in phonology and many facts about English could be accounted for by categorizing vowels as long (VV) or short (V). Thus in the rhyme type the specific vowels were replaced by this generic classification. For example the rhymes in the words ‘shield’ and ‘old’ were both treated as the same rhyme type, /VVld/. Likewise, the words ‘cat’ and ‘bet’ were both /Vt/ rhyme types. The short vowels used in the study were (shown in bold in the following examples): pit, pet, pat, putt, and pot. The long vowels were: bean, bay, buy, no, boon, brow, boy, barn, born, burn. Since the vowels were transcribed in a British pronunciation, vowel and /r/, such as in barn, born, and burn, were considered a single vowel. Some further distinctions among the vowels were taken up later. Since there are different numbers of vowels contributing to the long vowel and short vowel version of rhymes types, the counts were normalized by dividing out the number of contributing vowels. So for the rhyme-type /VVld/, which occurs in twelve words, the rhyme type count was 12 words / 10 long vowels, or 1.2 words per vowel. For short vowel rhyme types, the total counts were divided by five. The approach taken with the codas is to classify them based on a primary gesture and possible embellishments such as the addition of a nasal or liquid. For instance, the coda /nt/ would be classified as having a stop with a nasal embellishment. The full set of embellishments will be discussed below. One embellishment is worth discussing in advance, that being the post primary gesture coronal stop. This is used in codas such as /ft/ or /kt/. If the primary gesture is a /t/ or /d/ then this embellishment would create the coda /tt/ or /dd/, which do not occur. This is one of a family of embellished forms that require a pair of adjacent coronal stops. There is a significant difference between identical stop-stop pairs and non-identical ones like /pt/. In the latter, the tongue tip is able to overlap the previous closure. The tongue tip can be prepared during the previous gesture’s closure, thus allowing the closure to be initiated immediately as the previous gesture is released. It is because of the overlapping

4

character of the /t/ in the /pt/ coda that we count it as an embellishment. However, the second /t/ in /tt/ is not an overlapping gesture, it cannot be prepared, and its closure cannot be initiated during the previous release so counting it as an embellishment is a mistake. Producing this form in running speech would be difficult, and would likely be perceived with an intervening vowel, making this gesture sequence a multi-syllabic form The same situation arises with another pair of embellishment, the pre- or post-primary gesture coronal fricative embellishment, which also create adjacent identical segments when they are applied to an /s/ or /z/ (creating an /ss/ or /zz/). According to the formalism described above in (1) and more generally in (2), the set of rhymes can be ordered with respect to their underlying rate of occurrence based on how those forms violate constraints. Each form can potentially be related to multiple forms some of which violate more constraints and some of which violate fewer. These relationships in underlying rate of occurrence allow predictions to be made about observed rates of occurrence and in special cases whether forms will exist or not. However, forms are not all related to one another so these relationships only allow partial orderings to be described. Figure 1 illustrates how these partial ordering relationships will be shown. Arrows connect related forms. According to (1) the less constrained form should have a higher rate of occurrence. This also means that the arrow can be read directly as the existence implication in (1), and the opposite direction of the arrow as the non-existence implication in (1). The generalized forms of these implications in (2) are implicitly depicted in paths made by arrows between successively related forms. On the left side of the figure the set of partial ordering relationships, defined by the constraints of vowel length, voicing and coronality, are shown among the unembellished stop rhymes. This pattern of relationships is recapitulated on the right among the nasally embellished forms. Furthermore, the constraint against nasal embellishment implies relationships on a one-to-one basis for each pair of unembellished and embellished forms (e.g. /Vnt/  /Vt/, and /VVnd/  /VVd/, etc.). Two of these relationships are depicted in the figure by dashed arrows, in general only a single arrow will be used to depict the relationships between clusters of embellished forms and the corresponding cluster of unembellished forms. Several of the embellished forms are grayed out. This indicates that that rhymetype is not attested in English. If the observed rate of occurrences between two forms does not match the prediction from (1) the arrow between them is drawn as a dash, calling attention to this mismatch. However, since the observed relationships are subject to a certain degree of noise, further statistical tests will be required to determine if the anomaly is significant. The following sections will describe the partial ordering of both stop and fricative based rhyme types in turn. The full set of counts compiled for these rhyme types is listed in Table 2 (note that the stop rhyme data is relisted from Table 1). Analysis 1: Partial Ordering Stop Rhymes Stop rhymes are defined as those rhymes that include a stop consonant (t, d, k, g, p, or b) along with an optional embellishment of a pre-stop liquid, nasal, or /s/, a poststop /s/ or /t/. The only other context that stops occur in is /Vft/ and /VVft/, but in these cases the /t/ is considered a post-fricative embellishment and is covered below. Stop rhymes, as defined, are found in 1565 words, a good portion of the corpus. They also

5

have a rich relationship structure, shown in Figure 2. This figure is structured like Figure 1 except that the relationship between embellished forms and unembellished forms is represented by a single large arrow. Note that only forms with a single embellishment are shown, forms that have more than one embellishment such as /Vlpt/, as in sculpt, are discussed below. Of the 180 direct implicational relationships only 9 pairs exhibit a violation of (1) in the observed rate of occurrence. However, two of these are between the rhyme types that use /tt/ and /dd/ as a coda, which are special cases that were discussed above. The remaining pairs are legitimate violations of the predicted relationship from (1). The rhyme /VVld/ is particularly problematic as it is more common than both /Vld/ and /VVlt/ despite preferences for short vowels and unvoiced codas. /Vnt/ is also an anomaly in that it is less common than both /VNk/ and /Vmp/. Also among the embellished forms /Vks/ and /Vps/ are more common than /Vts/. Finally, among the unembellished forms the only violation is /Vg/ being slightly more common than /Vd/. There is a degree of noise in the data that may result in small discrepancies between the predicted and observed rates of occurrence. Since the magnitude of some of the differences is small, several statistical tests were performed to determine if the differences were significant. For pairs of rhyme types that had the same vowel length we were able to perform paired t-tests. When the vowel type was different, one was long and the other short, a one way ANOVA was performed. All tests were run over the total counts for the rhyme type, not the normalized per vowel counts. In the mixed vowel type pairs the long vowel set was reduced to the following set: bean, bay, buy, no, boon. This set of long vowels, used in later analysis, had greater parity with the short vowel and did not have the R-type or very long vowels. The difference between many of the discrepant rhyme types did not reach significance. For Vg/Vd t(4) = -0.30 ns; VVld/VVlt t(9) = 1.23 ns ; Vps/Vts t(4) = -.78 ns. However, two of the differences were significant: Vks/Vts [t(4) = 6.04, p < 0.01]; VVld/Vld [F(1,8)= 5.618 p