University of Rochester Working Papers in the Language Sciences—Vol. Spring 2000, no. 1 Katherine M. Crosswhite and Joyce McDonough (eds.)
Allowable variability: A preliminary investigation of word recognition in Navajo Joyce McDonough1 & Mary Ann Willie2 1
Department of Linguistics, University of Rochester, 2 Departments of Linguistics & American Indian Studies, University of Arizona
Abstract Because speakers do not produce uninflected or 'base' forms, and listeners do not hear them, the shape of the word lexicon in languages with highly productive word formation processes directly addresses the conflict between morphological theories which assume the primacy of word formation processes (Anderson 1992, Bybee and Moder 1983,) and theories of word recognition such as the Cohort theory (Caramaza, Laudana and Romani 1988, Marslen-Wilson 1978) which assume words are stored. How does a relationship between inflected forms, or between inflected forms and their more abstract base, get established? One (common) assumption is that less fluent speakers have less complete grammars and their mistakes reflect their less complete or 'imperfect' knowledge of structure. Since the productive morphology indicates a more complex word processing device and presumably a more complex word lexicon, these errors may reasonably reflect the principles that underlie the organization of the lexical system. In this study, designed to test the feasibility of this strategy, we produced a list of 100 Navajo forms, half of which were 'correct' Navajo words and half were 'incorrect', containing mistakes that less fluent Navajo speakers actually made. The Navajo verbs were categorized into 5 groups, reflecting five types of commonly occurring errors. We found that all speakers accommodated ‘errors’, with differences in the kinds of errors more and less fluent speakers tolerated. The results bear on the issue of the role context, fluency and morphological structure in the recognition of morphologically complex words.
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
McDonough & Willie—Word Recognition in Navajo
2
1. Introduction. The question of the nature of lexical presentation and word processing in morphologically complex languages is one that needs addressing in any theoretical framework. Because speakers do not produce uninflected or 'base' forms, and listeners do not hear them, the shape of the word lexicon in languages with highly productive word formation processes directly addresses the conflict between morphological theories which assume the primacy of word formation processes (Bybee 1995, Anderson 1992, Bybee and Moder 1983) and theories of word recognition such as the Cohort theory (Caramaza, Laudana and Romani 1988, McClelland and Elman 1986, Marslen-Wilson 1978) which assume words are stored. How does a relationship between inflected forms, or between inflected forms and their more abstract base, get established? Our working assumption is that less fluent speakers have less complete grammars and their mistakes reflect their less complete or 'imperfect' knowledge of structure. Since the productive morphology indicates a more complex word processing device and presumably a more complex lexicon, these errors may reasonably reflect the principles that underlie the organization of the lexical system. In effect, the mistakes of less fluent speakers may be seen to indicate where listeners/learners break apart and assemble words. Any systematic differences between less and fully fluent speakers may be seen to have relevance to the structure of the word and by extension to the structure of the lexicon and the processes that build inflectional paradigms. In this study, we produced a list of 100 Navajo forms, half of which were 'correct' Navajo words and half were 'incorrect', containing mistakes that less fluent (not non-fluent) Navajo speakers actually made. The data were drawn from the experience of teaching Navajo language classes by one of the co-authors. The Navajo verb forms were categorized into 5 groups, reflecting five types of commonly occurring errors. None of the speakers, even fully fluent speakers, scored 100% on these tests. We found that all speakers accommodated ‘errors’. However, there were interesting differences in the kinds of errors more and less fluent speakers tolerated. We take this to indicate that fluent speakers deem acceptable, or are willing to accommodate, particular kinds of variability in word processing, storage and retrieval. The results of this feasibility study bear on the
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
3
WPLS:UR, vol. S2000, no. 1
issue of the role context, fluency and morphological structure in the recognition of morphologically complex words. The second, concomitant, aim of this study is to provide instructional tools for Navajo educators. It is hoped that tests like these will allow more objective measures of a student’s fluency and learning, thus facilitating classroom instruction of the Navajo language. We also hope that this study is a basis for work on developmental aphasia. 2. Navajo verbal forms. The following is a short section on the structure of the Navajo verb form. The Navajo verb is a fully inflected form that stands as a complete proposition. Pronominal marking is obligatory in Navajo, and it has been argued that the NP’s in Navajo are adjuncts to the verb (Willie 1991, Willie & Jelinek 1995, Jelinek 1989). Every verb in Athabaskan has at least two morphs: the portmanteau mode/subject morph, and the verb stem. It has been argued that these two morphs are the base of two distinct syntactic constituents, an auxiliary or ‘infl’ (I) and ‘verb’ (V) constituent (McDonough 1990, 2000). The inflectional constituent (I), called the ‘conjunct’ domain, holds the morphemes marking mode (tense) and subject, and the verb constituent (V), holds the verb stem. Athabaskan verbs are minimally bisyllabic, one morpheme from each of these two constituents is necessary to form a word. Each of these constituents may also include prefixes. The (I) domain has an additional set of object and 3rd person agreement markers on its left edge. There is a third domain in the word, of proclitic-like morphemes that sit at the left edge of the (I) constituent, called the ‘disjunct’ domain (D). The boundary between these two constituents has been traditionally marked with ‘#’ in glosses (Kari 1975). Some examples and glosses of verbal constructions are provided below. The disjunct (‘D’), conjunct (‘I’) and verb stem (‘V’) domains are marked: (1)
yishcha ‘I cry.’ (Young and Morgan 1987 ) [(y)ish ] [ cha ] [øimp/1s] [cry:imp] I V
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
McDonough & Willie—Word Recognition in Navajo
(2)
hon¶l••d ‘I/he came.’ (Young and Morgan 1987 ) [ho + n¶ ] [ l••d ] [3s + nper/1/3/s ] [ ‘appeared, came’:perf ] I V
(3)
has¢¬b££s ‘I drove it up.’ ha # [ s¢ ] [ ¬ - b££s ] ‘up’ # [ sperf/1s ] [ cl - ‘move hooplike object’: perf ] D I V
4
In (1) is the minimal verb, with the two obligatory morphs, the mode/subject portmanteau /ish/ (ø-imperfective/1st person singular) and the imperfect form of the verb stem -cha ‘cry’. These two constituents must agree in their aspectual specification (here imperfective) (Hardy 1969). The form in (2) contains two conjunct morphs, and the verb stem, (3) has morphemes from all three domains. The boundaries between the ‘disjunct’ and ‘conjunct’ constituents, and between the ‘conjunct’ and verb constituents are areas of phonological activity. 3.
The study.
For this study we produced a list of approximately 50 Navajo forms, which were divided into five groups of around 10 pairs each (see figure 3 for distribution). Each of these forms was matched with an 'incorrect' Navajo form, resulting in 100 verbal forms. The ‘incorrect’ forms reflect mistakes that less fluent Navajo speakers actually make. These errors were collected over several years by the co-author from her experience teaching Navajo language classes. The word list was not easy to put together. Because of the complexity and productivity of the language’s morphology, it is difficult to invent ‘wug’ forms in Navajo, that is forms that are not able to be associated to meaning. There are two primary reasons for this difficulty. Firstly, since a word is a proposition in Navajo, containing many inflectional morphemes essential to building a proposition, the notion ‘new’ word is not particularly relevant.1 1
This type of system is a challenge to learnability theories which assume vocabulary development based on word acquisition. It is unclear what it means to have a vocabulary where most words are both poly-morphemic and associated to propositions. The ecology of this language, for instance the uneven and highly constrained distribution of phonemes, seems to inhibit dependence on notions like vocabulary size to explain the emergence of phonological regularities, as has been offered for English type systems (see Hay et. al 2000, Caramaza et. al. 1988).
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
5
WPLS:UR, vol. S2000, no. 1
The second reason is that segmental contrasts are strongly neutralized outside the stem morpheme, resulting in great deal of homophony among the large inflectional group of morphemes.2 Stem onset is the single place in the word where the full set of contrasts are found. This reduces the sound variability that is likely to be found if the phonemic inventory were distributed across the word. The problem lies at the heart of the issue. The impetus for this project came from questions the Navajo co-author had about the nature of the ‘errors’ less fluent speakers were making. The nature of the errors is characterized by mismatches between phonological, semantic and syntactic phenomena within the form. It is the mismatches that indicated the disfluencies, rather than an identifiable lack of competence in an autonomous linguistic component, such as phonological or syntactic. We feel that attention to the autonomous-less-ness of morphological productivity is an important aspect of this kind of testing. The forms we collected were categorized into 5 groups, reflecting what we considered five types of commonly occurring errors, ranked in order of ‘allowable variation’ by a native speaker. The groups are: 1) agreement errors between the two obligatory parts of the verb, 2) aspect mismatches, 3) valence anomalies, 4) postpositional agreement errors and 5) ‘disjunct’ prefix errors (see the appendix for a more detailed discussion of the list). The errors are all morpho-semantic (valence mismatches) or -syntactic (argument agreement errors), and they are often realized as small phonological changes at a specific locus in the word. There is a difference in the physical location of the error in the word between group 1 and group 5. Group 1 involves errors whose locus is in a mismatch between the two obligatory morphs in the verb. Group 5 contains morpheme arrangements that listeners may be more willing to accommodate, or less willing to mark as ungrammatical. Group 5 errors are on the left edge of the word and involve ‘disjunct’ morphemes. Unlike classic ‘wugs’, there are no examples that can be identified as purely phonologically new forms in the list. In the process of building the wordlist and running the subjects, we discerned that a listener’s linguistic competence is best graded in what s/he deem to be allowable variability (thus the use of scare quotes around ‘good’ and ‘bad’). Thus, the groups were ranked from least to most acceptable variation; the errors in group one were most likely to be S
See Young and Morgan 1987:30ff for a list of the morphemes available to the verb. See also McDonough (1999) for a discussion of the effect of this grouping on functional morphemes on the phonological typology of Navajo.
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
McDonough & Willie—Word Recognition in Navajo
6
deemed unacceptable by a fluent native speaker, group 5 the most likely to be accommodated. We will discuss the implications of this below. 3.1 Method. The list of ‘bad’ and ‘good’ forms were recorded being spoken by two fluent native Navajo speakers, a male and a female, using a Marantz PDM222 with a head mounted mic. The speakers practiced the forms on the list before the recording session to avoid stumbling over the pronunciation of the ‘bad’ forms, which were hard for speakers to produce. The recorded forms were digitalized using SoundEdit on a MacG3. The forms were randomized and presented as auditory stimuli in a response time experiment using Cedris Superlab software on a portable Mac. Ten listeners of various levels of fluency were recruited as participants in the study. The participants sat in front of the computer in an office in the American Indian Studies Department at the University of Arizona. Participants wore headphones. They were presented with a button box with colored coded buttons. Participants were instructed in Navajo. Two related forms were presented to them, they were instructed to chose which one was more correct. The next set of forms were presented only after a response was given. The test ran about 15 minutes per session. The response times and judgments for each item and subject were collected and analyzed. 3.2. Results We coded correct responses as responses that chose the ‘good’ form over the ‘bad’ form in the pair. Incorrect responses were those that chose the ‘bad’ form. The chart in figure 1 shows the percentage scores of the ten participants in the study. Note that no one scored 100%, although some of the participants were fluent Navajo speakers. For most subjects, the correct / incorrect response reflected a better than chance score. However, some listeners did considerably better than others as the graph in figure 1 shows. Response times (RT’s) were recorded for each trial. The response times were very varied, from immediate responses to several that took as long as 2 or 3 seconds. The raw RT’s are not good measures of the listener’s judgments, since the participants were not asked to respond quickly and were given as much time
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
7
WPLS:UR, vol. S2000, no. 1
as they liked for each trial. However they do serve to indicate a trend in the data. In figure 2 is a graph of the response times for the wrong and right answers for each of the 5 groups. First, the response times between correct and incorrect judgements were not significantly different. However, there was a difference in the standard errors for the two groups. The standard errors for the response times were consistently smaller for correct judgements than for incorrect ones. Participants were more consistent in their response time when their grammaticality judgements were accurate. It is not clear to the authors what these differences in the standard errors between the two groups indicate, other than a discernable and consistent trend in the data. It may be simply that the correct forms were easier to process than the incorrect ones.
Fisher's PLSD for RT Effect: type Significance Level: 5 % Mean Diff.
Crit. Diff
P-Value
1, 2
-600.2242
313.5054
.0002
S
1, 3
-372.3242
337.5713
.0307
S
1, 4
-372.9354
337.5713
.0304
S
1, 5
-851.9509
328.1568
.0001). At 5%, group 1 was significantly different from all the other groups. Group 5 was significantly different from all groups but group 2. The response times for group 5 were the highest and for group 1 the lowest.
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
9
WPLS:UR, vol. S2000, no. 1
5000
4000
GROUP 1
3000
2 3
2000
4 1000
5
N=
41
wrong
42
25
32
33
69
78
65
58
67
right
Figure 2. The response times by group for the correct and incorrect responses. Bars indicate standard error.
4. Discussion. There are several points of interest in this study. First, this was a feasibility study, done to test the design and suitability of this sort of task on Navajo speakers. There are very few studies of morphological processing in languages of this complexity. This fact is compounded by the relative lack of complete description of the language structure, although Navajo is one of the best described of any indigenous language. We believe that the results of this study indicate that the design is practicable. One issue that arises is the nature of the differences between the good and bad forms. Often these involve very small phonological changes in the word. The difference for instance between the pair 2.6 (wrong versus correct ‘I went over the edge and down on all fours’) is vowel length. This is not however a phonological error. The vowel length difference in this pair indicates a fluent speaker’s awareness of the phonological behavior of particular class of prefixal morphemes, here the disjunct < ch’¶ >
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
McDonough & Willie—Word Recognition in Navajo
10
‘horizontally outward’, and it’s interaction with the aspectual marking of the inner constituents. The short vowel is found in the imperfective form of the verb, . The verb stem in the form in 2.6 is the perfective form using the perfective form of the stem . Young and Morgan (1987:274d) describe it this way: the disjunct prefix sometimes takes the perfective form of the verb. When it does, the disjunct prefix takes the phonological form < ch’¢¢>. Only when it takes the imperfective does it take the phonological form < ch’¢>. (Young and Morgan 1987:274dff give paradigms of these and related verb forms.) A gloss follows (< hadah > ‘downward from a height’, < ch’¶ > ‘outward over an edge, < l-dlººsh > ‘move on all fours’)3: hadah ch’¢¢ldloozh *ch’¢ldloozh hadah ch’¶ # [ [ y¶ ] [ l – dloozh]]y-perf ‘down..’ ‘out..’ # [ yperf/1s ] [ cl – ‘move on fours’ perf] While a phonological description of this vowel length phenomenon is possible, a phonological explanation is not. The vowel length alternation, like most phonological changes in the word, is highly conditioned by morpho-syntactic and semantic phenomena. It is clear that a speaker is not learning phonotactics when s/he learns this kind of sensitivity to form. The authors believe that the errors that participants made are best characterized as ‘analogic misanalysis’, i.e. mismatches to word internal paradigms (Schreuder and Baayen). One way to think of this is that less fluent speakers may have less robust paradigms. Another issue concerns the difference between the 5 groups. For instance, the first group contains the most identifiable errors, and thus the least acceptable to a native speaker. We assumed that fluent speakers would be considerably less inclined to accommodate alternate constructions in these kinds of morpheme arrangements. The opposite is true for group 5. This concerns the difference in the locus as well as type of errors that are exemplified by these groups.
3
Note that this form is evidence for what we are calling the weakness of the left edge of the word. Young and Morgan 1987 report both and as disjunct prefixes from ‘position Ib’. This puts them inside the verbal complex. Young and Morgan 1987 also note that may be used independently. The final coda consonant in is evidence of this independence. This is not an isolated phenomenon. Many disjunct prefixes of a certain class (postpositional indirect objects for instance (Young and Morgan 1987:44)) exhibit this independence. Navajo speakers are very clear about when a morpheme belongs inside the complex and when it does not. What determines this independence is an unexplored question.
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
11
WPLS:UR, vol. S2000, no. 1
There is evidence that the left edge and the right edges of the word do not have boundaries of a similar strength, with the right edge being a strongly marked edge, and the left a weakly marked edge. McDonough (1999, 2000, 2000b) has shown that the stem, which is the rightmost morpheme in the word, has properties of prominence often associated to the phonological notion ‘stress’. The consonants and vowels in stems are considerably longer than any others in the word. This is the single place where the full set of consonantal and vocalic contrasts occur, and pitch range is expanded in this morpheme. These properties have the effect of producing a very distinct auditory profile at the right edge of the word, in particular in the final syllable, the stem. We expect that fluent Navajo listeners reflect the difference between these two word edges in their accommodations to alternate morpheme arrangements at the left edge. Also, the minimal verb in Navajo consists of two obligatory parts, a verb stem and an adjacent mode/subject morpheme. (For a list of the paradigms of these mode/subject morphemes see Young and Morgan 1987:200ff.) Group 1 contains errors in agreement between these two morphemes. We expect that fluent speakers would recognize these errors, and thus be quicker to react to them and be less tolerant of variation among them. The RT measures may reflect this difference. Since the participants were not controlled for their level of fluency, it is difficult to draw conclusions from the data. We intend to pursue an evaluation of the listener’s fluency by independent means—such as an evaluation of the subject’s use of Navajo—and to match it against the results of this experiment. If there is a correlation between the score on the test and the speaker’s fluency, this study may provide a viable method for testing the fluency. A fluency test of this sort is potentially a valuable community resource. It would, for instance, provide an objective means for evaluating students who enter a Navajo language class, or it may provide a test for disfluencies in developmental aphasia, or it may help evaluate learning disabilities. We feel that this method shows promise. Further work on this project will have as a component the development of this type of fluency test. We hope to develop this as a standard fluency test for Navajo speakers entering Navajo language classes.
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
McDonough & Willie—Word Recognition in Navajo
12
5. Conclusion. The results bear on the issue of the role context, fluency and morphological structure in the recognition of morphologically complex words. Participants differed in their scoring on the test. Not even fluent speaker scored 100%; we take this to be a result of their accommodation of certain kinds of forms. Participant 6 for instance, the subject with the highest score, made no errors on group 1. This is the group whose errors we believe are the most difficult for fluent speakers to accommodate. Participant 2, the subject with the lowest score, made several errors in group 1. While on the whole participants did not do better on one group than another, the response time patterns showed significant difference between the groups. The correct forms were identified with more RT consistency than the incorrect forms. We believe this test is worth developing. We plan a revision of the word list, for a baseline study using fully fluent speakers.
Acknowledgements: The work was supported by a grant from the University of Arizona. Our thanks to Eloise Jelinek, Merrill Garrett, Tom Bever, Joel Lachter, Dick Demers, Audrey Holland, none of whom are responsible for the contents. Our thanks also to the audience at the SSILA session of the LSA, January 2000.
Bibliography Anderson, S. (1992). A-morphous Morphology. Cambridge, Cambridge University Press. Bybee, J. and C. Moder (1983). “Morphological classes as natural categories.” Language 59: 251-270.
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
13
WPLS:UR, vol. S2000, no. 1
Bybee, J. (1995). Diachronic and typological properties of morphology and their implications for representation. Morphological Aspects of Language Processing. L. Feldman. New Jersey, Erlbaum: 225-246. Caramazza, A., A. Laundanna, et al. (1988). “Lexical Access and morphology.” Cognition 28: 297-332. Hay, J., J. Pierrhumbert, M. Beckman. (2000). Speech perception, wellformedness and the statistics of the lexicon. Papers in Laboratory Phonology VI. Eds. Broe and Pierrehumbert, Cambridge University Press. Hardy, F. 1969. Navajo Aspectual Verb Stem Variation. Ph.D dissertation, University of New Mexico, Albuquerque. Jelinek, E. 1989. ‘Argument type in Athabaskan: Evidence from noun incorporation’. University of Arizona. Kari, J. 1975. "The disjunct boundary in the Navajo and Tanaina verb prefix complexes." IJAL 41. McClelland, J. and J. Ellman (1986). “The TRACE model of speech perception.” Cognitive Psychology 18: 1-86. Marslen-Wilson, W. (1987). Functional parallelism in spoken word recognition. Cognition, 25, 71-102. McDonough, J. (2000). “On the bipartite model of the Athabaskan verb”. The Athabaskan Languages: Perspectives on a Native American Language Family. T. Fernald & P. Platero, Oxford University Press: 139-166. McDonough, J. (1999). “Tone in Navajo.” Anthropological Linguistics V41.4: 503-539. McDonough, J. 1990. Topics in the Phonology and Morphology of Navajo Verbs. Ph.D dissertation, University of Massachusetts at Amherst. Pierrehmubert, J. (2000). “Why phonological constraints are so granular.” SWAP conference proceedings. Schrueder, R. & R. Baayan (1995). “Modeling morphological processing”. Morphological Aspects of Language Processing. L. Feldman. New Jersey, Erlbaum: 131-154. Willie, M. (1991). Navajo Pronouns and Obviation, University of Arizona. Willie, M and E. Jelinek. 1997 “Navajo as a discourse configurational language” in The Athabaskan Languages: Perspectives on a Native American Language Family. T. Fernald & P. Platero, Oxford University Press: 252-287. Young, R. W. (2000). The Navajo Verb System : An Overview. Albuquerque, NM, University of New Mexico Press. Young, R & Morgan, W. 1992. An Analytic Lexicon of Navajo. University of New Mexico Press, Alburqueque.
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
McDonough & Willie—Word Recognition in Navajo
14
Young, R & Morgan, W. 1987. The Navajo Language. University of New Mexico, Albuquerque. Young, R., & Morgan, W. 1980. The Navajo Language. Alburqueque: University of New Mexico. Young, R., and W. Morgan. 1943. The Navaho Language. Salt Lake City: Education Division, United States Indian Service.
Appendix to ‘A feasibility study of Navajo word recognition’: Word list The co-authors decided on five classes of errors: (1) Pure agreement, (2) aspect matching, (3) valence anomalies, (4) postposition issues, (5) morphological competency. The first column contains errors ‘bad’, the second the ‘good’ forms. The categories are discussed after the list.
(1)
pure agreement bad sikaah stem:imp yishdee’ stem:perf yishz¡¡s pre:imp s¢¬h¢¢¬ pre:0/d stem: future
good sik£ yishd¢¢h y¶lz¡¡s s¢¬hª diyeeshy¢¢¬
1.5
b¶n¶shk¡¡’ pre:pers/imp stem:perf
b¶n¶¬k¡¡h stem :imp per:imp
1.6
ah¶siskad stem:per ah¢zkad pre:wrong classifier deesdz¢¢h deesdz¢¢¬ stem: impf y¶n¶’aah stem:imp y¶n¶’£ pref:perf
1.1 1.2 1.3 1.4
1.7 1.8 1.9
an¢’ªª¬ stem:fut pre: perf 1.10 yisht’o¬ stem:fut pre:imp
‘container of x sits’ (perf) ‘wipe’ (imp) ‘dribble little things’ (perf) I killed him’ (perf ¬) i’ll kill him (fut) ‘you tracking him down’ (imp) itracked him down (perf) ‘he clapped his hands’ (sperf ¬) ‘I will tan/scrape it’ (fut) ‘you brought something round’ perf
ad¶n¢esh’ªª¬
‘I will steal’ (fut)
deesht’o¬
‘I will wipe it’
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
15
WPLS:UR, vol. S2000, no. 1
1.11 ch’¢¢nisht’ee¬ stem fut pref; perf 1.12 k’¢nihizin
ch’¢¢nisht’e’
‘I let him back out’ perfective
k’¢niidzin
‘we want friendship’
Type (1) are simple mismatches between the tense/aspect of the prefix and the stem. For example is the 1st person singular form of the s-perfective and the stem is the perfective form . (1)
*sikaad [si]perf [kaad]imp sperf/1s ‘kaad’imperf
Since the aspect of the person number conjugation (perf) must match the aspect of the stem (perf), this constitutes an agreement mismatch between the aspect of the person number morpheme and the aspect of the stem. The form , the stem is in its perfective form . All of the examples in this section are of this type. This is the most straightforward of the types.
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
McDonough & Willie—Word Recognition in Navajo
(2)
2.1 2.2 2.3 2.4
2.5
2.6
2.7 2.8
2.9 2.10 2.11
2.12
16
Aspect mismatches bad’ si¬jid pre:wrong per (s) n¢¬’a’ stem:perf ni+sperf diniy¡ pre: d +nperf kintahji’ nin¢¬b££z needs d¡¡di’yishbaa¬ 0imper
‘good’ n¶¬jid
hadah ch’¢ldloozh
hadah ch’¢¢ldloozh
k’¶hinistsee¬ ch’¶naash¡¡h the means down nikinis’eez pref needs a bik’idiinist’a’ bi¬ shich’•’ y¶n¶’¡¡zh needs
k’¶hideestsi¬ ch’¶n¡sh¡¡h the is iterative nikidinis’eez
bi¬ nani’¡¡zh pre:imperf stem perf
bi¬ y¶n¶’¡¡zh
d¢¬’a’
‘I back-carried him’ n-perf: ‘I send him off’ di+sperf
niniy¡ needs ni+nperf kintahji’ nin¶¬b££z
‘I am tired’
d¡¡di’nishbaa¬ nimperf
‘I’m closing it’ requires n-imperf it’s the prefix that requires it.
bik’inaast’a’ bi¬ shich’•’ d¶n¶’¡¡zh
‘I drove it as far as town’ perf
translation YM278 “I chopped it in half’ “I walk out again’
‘I have my feet on the ground’ ‘it flew over him’ ‘you have started to come visit’ perf both fine without
shich’•’ ‘ you came with him’
Type (2), ‘aspect matching’, is similar to type (1) in that the mismatch is between two components in the verb, but there is an additional level of complexity since the mismatch has an important semantic component. It is not an mismatch in
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
17
WPLS:UR, vol. S2000, no. 1
aspect agreement between the two components. Navajo is a ‘verb classification’ language; the verb roots exhibit a rich set of semantic specifications that interact with aspect. The examples in this group are mismatches between the inherent ‘type’ of the verb and the aspect of the prefix complex. Since Navajo can draw much finer aspectual distinctions than English, because these distinctions are morphological and not periphrastic, it is difficult to describe these mismatches with reference to English. However, these mismatches are somewhat similar to distinctions found between verb pairs like finish/ stop and the adverbial phrases that can be used with them. (2) *he finished for an hour vs. he stopped for an hour with the caveat that there is a great deal more range for this type of construction in Navajo than in English. An example from Navajo follows:
(3)
si¬jid
ni¬jid
‘I back-carried him’
The is the s-perfective. The perfective verb stem ‘back carry’ requires a n-perfective infl stem . indicates the end of action (atelic). For the differences between these two perfectives see Young and Morgan 1980, 1987, Young and Morgan & Midgette 1992. These examples constitute a set of forms that indicate a speaker’s access to complex knowledge about the semantic mapping between morphological structure and meaning. In these examples, we are simply trying to establish a classification of some of more common mismatches that speakers were found to make.
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
McDonough & Willie—Word Recognition in Navajo
Type (3) 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9
3.10
18
Valence Anomalies
‘bad’ nihit£ object shi’aah bikaah shishºn¶ shi nasht’e’ nayinishtin needs obj marker shi’ad£ shiy¶¶¬naad two objects nihicha object needs subject bideestsee¬ no 3rd object needed
‘good’ noot£ subject shaa n¶’aah baa n¶kaah nishshºn¶ shik’¶ naalt’e’ nabinishtin ‘ash£’ shii¬naad wohcha/ nihi¬cha yideestsee¬
Type (3) are valence anomalies. Valence anomalies are violations of the valence of the verb constituent (classifier + stem) by inappropriate pronominal marking in the verbal complex. For example in (1) the is the dual object marker. The verb is the intransitive ( a stick-like object); it requires a subject marker, . There are several types of violations of this sort in these examples. In (2) and (3) the valence requires marking for three arguments for the ditransitive (give). The errors, which were taken from actual mistakes that speakers made on written exams, fail to mark the subject and mark the indirect object as a direct object.
(4)
*shi’aah [ sh-i ] [ 'aah ] [Do -1 st ] [ give]
shaa n¶'aah InO [n¶ ] ['aah] me-to [2nd Su] [ give] 'You give it to me (roundish object)'
In the correct forms the indirect object, the recipient, is marked in the postposition. In (3.4) the correct form is the neuter. Neuters do not allow a 1st person imperfective subject; it doesn't take . The correct neuters require
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
19
WPLS:UR, vol. S2000, no. 1
an n-aspect form (see Y&M). In (5) the form sounds like he’s trying to say ‘it dropped on me’, so they put the first person prefix on the verb. The correct form is 'dropped, it, down on me’. The shi- is a postposition with ‘down’ and not a prefix. Valence is transitive, not di-transitive. (5)
shi nasht’e’
shik’¶ naalt’e’ shi-k'¶ [ na [ i ] [ l-t'e '] ] me -down
Some of the forms have two object markers (3.7, 3.8). Some have bi/yi switch; they require a bi- object (3.6, 3.10). (3.9) has an object marker and the verb is intransitive, requiring a subject marker. (6)
nihicha [ Do ] [ cha]
wohcha/ nihi¬cha [2nd Su] [cha] nihi¬cha [ nih-i] [¬ -cha] [2nd Do - 3 rdSu] [ cl -stem ]
The correct form has a ¬- classifier (making it transitive), or uses the 2nd dual subject marker [oh].
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
McDonough & Willie—Word Recognition in Navajo
20
(4) 4.1
Postposition y¶n¶’¡¡zh
bi¬ y¶n¶’¡¡zh
4.2
naash’aash
bi¬ naash’aash
4.3
shik£ 2 args, needs 3 ninahi¬niih
shaa yin¶k£
baa nahashne’
4.6
yee nahashne’ 1st person takes shich’•’ d¢sdzil
4.7
haashkai
bi¬ haashkai
4.8 4.9 4.10
da haashkai bi¬ daahashy¡ bi¬ yishke’ ke’ is dual
bi¬ haashkai
4.11
yiij¢¢’
biih yiij¢¢’
4.4
4.5
nah¶¬niih
bich’ª’ d¢sdzil
yiike’
‘ you came with him’ perf stem is dual ‘I am walking with him’ stem is dual, imp ‘he gave it to me’ perf me pp ‘you bought it’ this one’s too hard see YM528 ‘I talk about him’ imperf
‘I put my strength to it’ perf ‘I came up with them’ kai = pl ‘I came up with him’ ‘we (2) were left behind’ perf YM:795 transitional ‘we 3 ran into it’
Type 4 are postposition errors. In these forms a listener's proficiency with the use of the postpositions is tested. In some there is no postposition where one is needed. This occurs when the verbal argument is di-transitive and needs three arguments, one of them realized in the postposition. Thus this is a kind of valence error. We have also included one example (4.10) where the form contains a postposition and does not need one. In (4.3) the verb stem is di-transitive, it needs three arguments, with the 'to me' in a postposition. There are several examples of this kind. (7) *shik£
[sh -I ] [k£ ] [1stDo 3 r Su] [k£ ]
shaa yin¶k£ shaa [yin¶ ] [k£ ] me-with [3rdDo - ] [k£]
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
21
WPLS:UR, vol. S2000, no. 1
'he gave it to me' The form in (4.10 ) is the opposite of the others (4.1, 4.2, 4.2, 4.7, 4.11). It has the postpositional and doesn't need it: (8)
*bi¬ yishke’ bi¬ [ ish ] [ ke' ] [1 stSu]imp [ ke' ]imp
yiike’ [ ii ] [ke' ] [2ndDual]imp [ k'e]imp 'we (2) were left behind'
The forms in (4.7) and (4.8) are related to each other by the error in the postpositional argument. The verb stem needs a plural object. The form in (4.7) has a singular object, (4.8) has the wrong postposition. (9)
*haashkai *da haashkai
bi¬ haashkai bi¬ [ha # [ish ]imp [kai]imp him-with [ 'up' [1st Su]imp [ kai ] 'I came up with him'
In (4.6) the postposition contains the wrong argument 1st person instead of 3rd person agreement. (10)
*shich’•’ d¢sdzil
me 'up to' [d¢sh ]perf [dzil]perf
bich’ª’ d¢sdzil it-'up, to' [d¢sh ]perf [dzil]perf 'I put my strength to it'
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
McDonough & Willie—Word Recognition in Navajo
(5) 5.1
5.2 5.3 5.4 5.5 5.6 5.7
5.8
5.9
5.10
Disjunct prefix da nihi¬hºzh≠ nihi¬ dahºzh≠ in wrong place, nihi¬ is pp da bi¬ hooghan bi¬ dahooghan as above honaagaii naahoogaii daan¡¡cha n¡¡ nihi¬hºzh≠ misplaced na’¡dahashniih misplaced nihºneez goes with neez in nonnueter shich’•’ ¡d¶deeshx¢¢¬ double makring on IO yin¡¡gish in wrong place n¡¡han¡d¡¡h
22
‘we are happy’
‘they live with them’
‘¡danahashniih
‘the whole area became white again’ ‘they are crying again’ ‘we’re happy again’ ‘you (2) are ...’ either ‘I’m buying it for myself’
hºneez
‘it (area) is long’
¡d¶ch’•’ deeshx¢¢¬
‘I will take it to myself” (big load)
n¡¡neigish
‘he’s cutting it again’ impf
han¡¡n¡d¡¡h
‘he’s coming up again’ impf
n¡¡daacha nihi¬ n¡¡hºzh≠
Type 5. Disjunct displacements. In these examples, the disjunct prefixes appear in the wrong position with respect to the postpositions. We think of these as disjunct displacements. In the pair below is the 2 nd dual marker ‘with us’, is the distributive marker. The morpheme is a disjunct morpheme, as such, its place is at the right edge of the word, as it is placed in the starred form below. (11)
*da nihi¬hºzh≠
nihi¬ dahºzh≠ ‘we’re happy’
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
23
WPLS:UR, vol. S2000, no. 1
However, in this starred form, the is placed outside the 2nd dual marker, the postpositional ‘with us’. This dual marker is not part of the agreement complex within (I) constituent of the verb where it is forced by the presence of the disjunct . The gloss, of course, is not an accurate representation of the syntactic structure, which is more accurately glossed as ‘it is peaceful, among us’. The form is a neuter imperfective construction (Young and Morgan 1987:356). (12)
nihi¬ dahºzh≠ nihi¬ [ da # [ hº] [zh≠] ] ‘with us’ distr # [ 3rd area ] [be peaceful]
The incorrect forms misplaces the inside the conjunct domain: (13)
*danihi¬ hºzh≠ [ da # [ (nihi¬) hº] [zh≠] ] *distr # [(‘with us’) 3rd area ] [be peaceful]
All the forms in this group contain errors of this sort.
Joyce Mary McDonough Department of Linguistics University of Rochester Rochester NY
[email protected]
Mary Ann Willie Department of Linguistics University of Arizona Tucson AZ
[email protected]
http://www.ling.rochester.edu/wpls/s2000n1/mcdonough.willie.pdf
University of Rochester Working Papers in the Language SciencesÑVol. Spring 2000, no. 1 Katherine M. Crosswhite and Joyce McDonough (eds.)
Comparison of intonation patterns in Mandarin and English for a particular speaker Katrina Schack (
[email protected]) Departments of Linguistics and Mathematics University of Rochester
Abstract In this paper I will address two questions regarding intonation: first, what do intonation patterns look like in a specific variety of Chinese, and second, how does a native speaker of that language interpret intonation in English? This research indicates that this speaker's Chinese intonation patterns do not display the pitch register distinction posited for Beijing Mandarin. However, she does use both pitch range expansion and high boundary tones as methods for distinguishing statements from questions. Her English intonation system displays a much denser assignment of tonal targets than that of a native speaker of English. She demonstrates a potential knowledge of pitch accent for words spoken in isolation, but she continues to apply the same tonal pattern to individual words even in the context of a larger utterance, thus using a system that more closely represents lexical tone. However, she demonstrates knowledge of English boundary tones. Thus, this research provides evidence for the way in which specific aspects of one's native language may be systematically applied to a very different system.
1. Introduction: Lexical Tone vs. Intonation Lexical tone is a particular tonal pattern assigned in the lexicon, and this assignment is contrastive. For example, in Mandarin the word ma ÔmotherÕ, pronounced with a high level tone, is distinct from ma ÔscoldÕ, pronounced with a falling tone. Intonation, on the other hand, is a tune assigned over an entire utterance. Rather than distinguishing lexical items, it distinguishes different meanings for sentences. The interpretation of an intonation pattern is determined in the lexicon, and the lexicon also provides a way of attaching the tune to a text, but the tune is not attached to any particular utterance in the lexicon. For
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
25
WPLS:UR, vol. S2000, no. 1
example, in English, a certain type of rising ending on an utterance indicates that it is a question. It has been argued that the existence of lexical tone does not prevent the existence of intonation (see for example Ladd and Hirst and DeCristo). However, not much is known about how lexical tone and intonation interact. According to one view, both lexical tone and intonation patterns are specified as an abstract sequence of high and low tones (Ladd, Peng). These tones have no absolute physical value. Rather, they are implemented through the manipulation of pitch, the fundamental frequency (f0) of the voice, which rises and falls to meet these tonal targets. Now, Chinese is a tone language, while English is an intonation language. That is to say, if Chinese has both tone and intonation, then Chinese assigns tonal targets on a lexical as well as phrasal level, while English only assigns a intonation tune on a phrasal level. Thus, in Chinese, the dual usage of tone leads to a more complicated picture than is found in English, making it more difficult to separate lexical tone from intonation. Moreover, it is not clear how a native speaker of a tone language would deal with tone in English, since English uses intonation but makes no specifications for lexical tone. This paper will investigate this issue.
2.1. Background: Lexical Tone in Mandarin Chinese The first issue one confronts in examining what a Mandarin speaker does in English is to determine what she does in Mandarin. Moreover, in order to distinguish intonation from lexical tone movements, it is necessary first to examine the characteristics of lexical tone in isolation. Mandarin Chinese specifies four lexical tones. They will be referred to in this paper in the standard way: in isolation first tone is a high level tone (1), second tone is a mid-rising tone (2), third tone is a falling-rising tone (3), and fourth tone is a falling tone(4). In addition syllables may be lexically assigned a "neutral tone", or more accurately, they fail to receive a specification for lexical tone. This is generally the underlying tonal specification (or lack thereof) for syllables that are never stressed and for particles. Although these tones are known to vary greatly even over small geographic areas (Giet: 1946, 1950), it was found that the consultant's lexical tones, at least in careful speech in isolation, are in line with those of Beijing Mandarin. These tones will be transcribed in this paper as a numeral following the standard pinyin transcription of a word.
2.2. Background: Intonation in Mandarin Chinese Research on intonation patterns in Mandarin is somewhat rare, and as a result there are few general conclusions as to what the intonation patterns are.
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
SchackÑComparison of intonation patterns in Mandarin and English
26
Moreover, the majority of the research has been carried out on Mandarin as spoken in Beijing. Most notable among those making early auditory characterizations, Chao (1968) lists 13 basic intonation patterns for Mandarin Chinese. Many of these 13 intonation patterns are, however, emotive rather than purely linguistic distinctions. In addition, he maintains the idea that two particles in Chinese were phonetically realized only as a rising tone or as a falling tone at the end of a sentence in certain contexts. Chao also makes the observation that: ÒIn questions ending in ma the sentence intonation is usually fairly high ...Ó (801). Thus, Chao's observations argue that intonation in Mandarin can be realized either as a successive addition of intonation to the end of a lexical item, thus changing the shape of the lexical item, or as a simultaneous addition that will affect the entire sentence melody. Later work, based on instrumental measurements, rejects the idea that intonation may be realized as the final addition of a high or low tone to the lexical tone of the final word of an utterance. Ho (1977), for example, demonstrates that the shape of a final lexical tone may be compressed or expanded, it nevertheless retains its final fall or rise in the context of a declarative, interrogative, or exclamatory utterance. Ho's data also shows a basic distinction in tone register between statements and questions; a much higher pitch is used throughout a question than in a statement. More recently Shen (1990) has demonstrated that Beijing Mandarin is characterized by three basic intonation tunes, generalized in Figure 2.2.1.
Type I
Type II
.
Type III
f0
time
Figure 1: Shen's Intonation Tunes (26)
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
27
WPLS:UR, vol. S2000, no. 1
Tune I is used for assertive intonation, Tune II for unmarked questions and particle questions, and Tune III for A-not-A questions. Thus, Shen concludes that the primary prosodic distinction between a statement and a question in Chinese is the significantly higher pitch at the beginning of an utterance. Certain types of questions then continue in a higher register throughout the utterance, while other types of questions fall to the same ending point as that of a declarative utterance. Thus, she concludes that it is the register rather than the contour of the pitch that has intonational significance for tone languages (72). She amends this statement, though, by pointing out that the general pitch contour shown above is a result of intonation and not a result of lexical tone (75), but she does not recognize any successive tone addition as being the result of intonation. Kratochvil (1998) and Garding (1984) claim that Chinese intonation is characterized by a grid of two lines that may be narrowing or widening, rising or falling throughout the utterance. Between these two lines the tonal targets are placed. Kratochvil specifically mentions pitch range expansion as being an intonation pattern characterizing focus. Xu verifies this statement with his close examination of the effect of focus on short declarative sentences in Mandarin. In addition, he also demonstrates that the lexical tones remain distinctive even though they are modified as a result of the tonal context of both surrounding lexical tones and focus intonation, and he asserts that lexical tone and focus are the primary determiners of f0 in short declarative Mandarin sentences. With this background, then, it is not entirely clear what one should expect to find in Mandarin intonation. Moreover, the tendency to find great variety in lexical tone even over a small region and within the same dialect at least suggests the possibility that variety may exist in intonation patterns as well.
3. Methodology The scope of this investigation is limited to a case study of a particular speaker of the variety of Mandarin Chinese that is spoken in Pang, a small village in Hebei Province, China, approximately 100 miles south of Beijing. The informant is a 23-year-old female who lived in this village until the age of 16 when she moved to New York State. She was educated in her village, and much of this education took place in standard Mandarin; moreover, her current use of Chinese is primarily among Mandarin-speaking students for whom the Beijing variety is prestigious. It is only in talking with her family, primarily by telephone, that she uses the Pang variety. Nevertheless, she states that her Chinese is strongly accented. In my investigation of her Chinese intonation patterns I made use of a portion of the sentences Shen used in her experiment. Although my speaker
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
SchackÑComparison of intonation patterns in Mandarin and English
28
found many of them odd grammatically and/or semantically (in contrast with the opinion of ShenÕs informants) she did not seem to think this would affect the way she read them, and so she did not change them. Seven statements were examined, all of which contained within the utterance only syllables of one lexical tone. In addition, the corpus included three different types of yes-no questions formed from these sentences. The first was unmarked questions, that is, questions that were lexically identical with the statements. Questions marked with the particle ma, a particle that is added to the end of a statement, made up the second set. In addition, there were four questions, one for each lexical tone, representing the Anot-A pattern, a question form that is formed by following the verb with its negation. Only those statements containing a direct object could fit this syntactic construction. Each statement consisted of either 4 or 5 syllables; the other sentences acquired more as was necessary. Each type was repeated three times for a total of 75 tokens. The pinyin transcriptions of these sentences, along with their English interpretations, is included in Appendix A. The consultant was recorded using a Tascam DAT at 44.1k via a Shure headmounted microphone. The recording was done in a small classroom with a minimum amount of background noise. The speaker was previously asked to go through the lists and familiarize herself with them; words she was not comfortable with were changed. She was then presented with the lists, each one of which she read three times. She reported that she carried out the second reading of the Chinese sentences in a ÒdifferentÓ manner, although she was unable to specify exactly what this manner was. To the ear of someone who knows only a little of the language, it sounded less formal than the others. The data was then transferred to computer files by way of Sound Edit and analyzed using Pitchworks on a Macintosh. The theoretical model used for this paper is ToBI (Tone and Break Index), a system developed by Beckman and Pierrehumbert (1986). ToBI provides a method for marking high and low tonal targets in a sentence and distinguishing the varying combinations that may occur. This system was developed for American English with the theory that it could be extended to other languages as well. However, other languages use different features in determining intonation patterns, and as we shall see, one important feature this system lacks that of pitch range expansion; making use of this feature follows Svetlana GodevacÕs work on Serbo-Croatian.
4.1. Chinese Results: Register Tone In order to compare this data to ShenÕs work, it is necessary make f0 measurements at the beginning and end points of each utterance, as well as the high and low point of the pitch contour. The final results for this data are
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
WPLS:UR, vol. S2000, no. 1
29
calculated both with and without the particle ma, as the pitch assigned to it varied depending on the lexical tone of the preceding syllable. When the average was taken for all the sentences and plotted using Excel, the results decidedly do not agree with ShenÕs data. Charts are pictured below (Figures 2 and 3). 370 350 330
(Hz)
290
F0
310
270 250 230 210 190 beginning
high
low
end
sentence position statement
unmarked question
A-not-A question
ma question
Figure 2: The average F0 of the beginning, end, high, and low tones in ShenÕs study (19) Shen's data demonstrates a clear distinction between the high register of the unmarked and ma questions and the low register of statements, and it also shows the A-not-A questions beginning in the high register and ending in the low register. When the average values are taken for the Pang speaker, though, all four types of sentences fall within 20 Hz of one another at the beginning and high points, meaning that the difference is not perceptible and therefore probably not significant (Rossi (1971), referenced in Shen, 19). When the ma is excluded, ma questions are about 27 Hz higher than the others at the low point. Unmarked questions are the only type that distinguish themselves at the end point, being about 30 Hz higher than the others. Thus, on average, the consultant decidedly failed to display the same distinction in register tone that Shen's research demonstrated.
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
SchackÑComparison of intonation patterns in Mandarin and English
30
330 310
F0 (Hz)
290 270 250 230 210 190 beginning
high
low
sentence statement ma question
end
position
unmarked question ma question w/o ma
A-not-A question
Figure 3: The f0 averages of the beginning, end and high and low points in the pitch contour of the present study Since these sentences are carefully regulated for tone, and since the speaker stated that at least one of the readings was completed in a different manner from the others, it is to some degree questionable how accurate the results taken from the averages are. Since, however, normal speech is not regulated for tone and since many different manners of speaking can be adopted, the averaged results probably most accurately reflect normal speech. Nevertheless, the given measurements of fundamental frequency do not characterize well the intonation patterns of the Pang speaker, and thus observational generalizations are used. 4.2. Chinese Results: Boundary Tone and Pitch Range The following four pitch tracks demonstrate the pitch movements for a statement consisting of each of the four tones.
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
WPLS:UR, vol. S2000, no. 1
31
spower3
hong2
words
bi2
tou2
mei2
quan2
intonation
350 300 250 200 Hz
ms
250
500
750
1000
1250
sclean3
bao1 shen1
words
gong1
ca1
che1
intonation
350 300 250 200 Hz
ms
250
500
750
1000
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
1250
SchackÑComparison of intonation patterns in Mandarin and English
32
sbuy1amp
lao3
words
shou3
zhang3
mai3
jiu3
intonation
350 300 250 200 Hz
ms
300
600
900
1200
1500
spicture1
guo4
words
lu4
ke4
zhao4
xiang4
750
1000
intonation
350 300 250 200 Hz
ms
250
500
1250
Figure 4: Chinese statements, 1st, 2nd, 3rd, and 4th tone Most of the pitch movement within these utterances is the result of lexical tone, which may be modified by the surrounding lexical tones. There is also some indication of movement similar to Shen's Tune I (see Figure 1). Observationally there are few differences between statements and unmarked questions. The primary difference is found in the potential existence of a high boundary tone, which manifests itself in slightly different ways depending on the lexical tone of the final word. In 1st tone sentences the last syllable (or two syllables) maintain their tonal shape from type A but are moved up about 20 Hz so that they are slightly higher than the third syllable, rather than slightly lower as they are in the statement, as seen Figure 5, c.f. Figure 4.
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
WPLS:UR, vol. S2000, no. 1
33
uqclean2
bao1
words
shen1
gong1
ca1
che1 H%
intonation
350 300 250 200 Hz
ms
200
400
600
800
1000
Figure 5: Unmarked question, 1st tone In 2nd and possibly in 3rd tone sentences, the lexical final rising pitch is extended so that it rises more than it does in the statement equivalents. Often, but not always, this is perceptually more. 4th tone sentences display a leveled out fall instead of a fall with an even slope. These variations are shown in Figure 6. uqpower3
hong2 bi2
words
tou2
mei2
quan2 H%
intonation
350 300 250 200 Hz
ms
250
500
750
1000
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
1250
SchackÑComparison of intonation patterns in Mandarin and English
34
uqbuy1
lao3
words
shou3
zhang3
mai3
jiu3
H%
intonation
350 300 250 200 Hz
ms
250
500
750
1000
1250
uqpicture1
guo4 lu4
words
ke4
zhao4
xiang4 H%
intonation
350 300 250 200 Hz
ms
200
400
600
800
1000
Figure 6: Unmarked questions, 2nd, 3rd, and 4th tone This rise indicates the existence of a high intonational target which affects the realization of the final lexical tones in various ways. Although the lexical tones are modified in different ways, though, there is always a higher pitch involved. The question particle ma has an underlying neutral tone, and it has been theorized that it receives its tonal pattern from the sentence intonation (see Shen (39) and Ladd). However, for this data it is clear that while intonation may play a role, the preceding lexical tone plays the primary role in determining the pitch of assigned to ma. In general the ma questions are most comparable to unmarked questions, although here again the exact way in which this holds true is dependent on the final lexical tone assignment in the utterance. In 1st tone sentences the ma
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
WPLS:UR, vol. S2000, no. 1
35
demonstrates a fall of about 125 Hz, whereas the sentence apart from the ma follows the same pattern as the unmarked question does (Figure 7, c.f. Figure 6). maclean1
bao1 shen1
words
gong1
ca1
che1
ma H%
intonation
350 300 250 200 Hz
ms
300
600
900
1200
1500
Figure 7: Ma Question, 1st tone In 2nd tone sentences, the ma also displays a fall at the end, but only of about 30 Hz. The first part of the ma continues the rising pattern of the final 2nd tone syllable, which in turn more closely resembles the rise occurring in the context of the statement than in the unmarked question (Figure 8, c.f. Figures 5 and 6). mapower2
hong2
words
bi2
tou2
mei2
quan2
ma H%
intonation
350 300 250 200 Hz
ms
250
500
750
1000
Figure 8: Ma question, 2nd tone
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
1250
SchackÑComparison of intonation patterns in Mandarin and English
36
For 3rd and 4th tone sentences, Figure 9, the ma is simply assimilated into the sentence final tonal pattern of the unmarked question. Thus, the tonal pattern assigned to the last syllable in an unmarked question (4th tone) or statement (3rd tone) will instead be assigned to this syllable combined with the ma in the context of a ma question. mabuy2
lao3
words
shou3
zhang3 mai3
jiu3
ma H%
intonation
350 300 250 200 Hz
ms
250
500
750
1000
1250
Figure 9: Ma questions, 3rd and 4th tone Thus, the evidence indicates that for these three types of sentences the pitch movement is primarily determined by lexical tone assignments, combined with the existence of a high boundary tone for the questions. The odd interaction of this boundary tone with the final lexical tone and with the particle ma has previously been documented for Mandarin speakers (Shen, 41), but it remains
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
WPLS:UR, vol. S2000, no. 1
37
unexplained. The only feasible explanation in light of any of the current intonation models is that the pitch is the result of the application of a phonological rule on the tonal tier which alters the tonal specifications. The A-not-A sentences display a clearly different intonation pattern than do the declarative sentences. This manifests itself as a widening of the pitch range for some combination of the verb, the negation word, and the following verb. As this is not explicable in the standard ToBI model, which makes specifications only for H and L, the added feature of pitch range expansion is necessary, transcribed as < > (Godevac). If the verb is underlyingly 1st or 4th tone, its first occurrence is considerably higher than the preceding word. Following Kratochvil's model, these syllables begin at the high point of the expanded pitch range. For 4th tone, a falling tone, the fall takes place in the first occurrence of the verb, leaving mei2 ÔnotÕ relatively flat, while for 1st tone, a level tone, the fall takes place in mei and the first occurrence of the verb is flat, as seen in Figure 10. It seems that along with the widened pitch range there is a necessity to cover the entire range, but the exact way in which this happens is a function of lexical tone. bupicture1
guo4
words
lu4
ke4
zhao4
mei2
350 300 250 200 Hz
ms
300
600
900
1200
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
1500
SchackÑComparison of intonation patterns in Mandarin and English
38
buclean2
b a o 1s h e n 1
words
gong1 c a 1
mei2
350 300 250 200 Hz
ms
300
600
900
1200
1500
Figure 10: A-not-A questions, 4th and 1st tone If, on the other hand, the verb is underlyingly 2nd or 3rd tone, its first occurrence begins at about the same height as it would for the equivalent statement and may fall slightly, but it fails to display a final rise that it does in the context of a statement (Figure 11, c.f. Figure 5). Rather, mei2 ÔnotÕ displays a rise in f0. The final verb differs despite the grammatically necessary shared tone. You3 ÔhaveÕ simply falls (it is used as the main verb in this pattern as the opposite of mei2 ÔnotÕ, the main verb used in the equivalent statement), while mai3 ÔbuyÕ displays a falling-rising pattern beginning and ending at roughly the same f0. This is most likely a result of the fact that you3 tends to be much more strongly influenced by surrounding lexical tones than most words are. bupower3
h o n g 2b i 2 t o u 2
words
you3 mei2
350 300 250 200 Hz
ms
300
600
900
1200
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
1500
WPLS:UR, vol. S2000, no. 1
39
bubuy3
lao3
words
shou3 zhang3 mai3
mei2
jiu3
350 300 250 200 Hz
ms
300
600
900
1200
1500
Figure 11: A-not-A questions, 2nd and 3rd tone In either case, however, the feature of pitch range expansion is definitely apparent, especially in comparison to the equivalent statements (Figure 4). 4.3. Summary of Chinese Data To summarize, this research uncovers several distinct intonation patterns for Pang Mandarin. Although statements do generally follow the pitch curve suggested by Shen (Figure 2.1), the pitch register distinction that she and others posited for Beijing Mandarin is not present. Rather, the consultant makes use of high boundary tones and pitch range expansion to distinguish various forms of interrogative utterances from declarative utterances. The realization of the boundary tone is particularly strongly affected by lexical tones in questions ending with ma, an effect that has been documented in various sources. This behavior seems to be best explained with a phonological rule altering the tonal specifications. The use of pitch range expansion has been demonstrated to be a result of focus in Beijing Mandarin (Kratochvil, Xu); it is impossible to tell from this research whether the A-not-A question is actually making use of focus, but nevertheless the intonation pattern is clearly demonstrated. It was impossible to describe the results of this pattern using standard ToBI notation; rather, a new feature needed to be added, that of pitch range expansion. A variety of things could account for the difference between the consultantÕs speech and standard Beijing Mandarin, but the strongest possibility is the differences in the way the two varieties are spoken. Although the information was unavailable at the time the research was conducted, it was later discovered that questions are to some degree not even formed in the same way in the Pang
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
SchackÑComparison of intonation patterns in Mandarin and English
40
variety as they are in the Beijing variety, and thus there is a strong possibility that the intonation reflects the Pang variety.
5. English Background It will become immediately clear to the reader who is familiar with English intonation patterns that the patterns discovered for Chinese are much different than those found in English. According to the ToBI system, English intonation tunes can be transcribed with three different kinds of tone: a pitch accent, a phrase tone, and a boundary tone. Each of these can be specified as either high or low, and minimally an utterance must contain one of each type (see Pierrehumbert & Beckman (1986), Ladd, and Hayes). A pitch accent (*) is aligned with a prominent syllable in the utterance, while the phrase (-) and boundary (%) tones occur at the edge of a domain. For example, a standard English declarative intonation tune is H* L-L%, as shown in the Figure 12, ÔAllen married MarieÕ. The focus is on ÔAllenÕ, moving the pitch accent to the first word in the utterance from its default position on the final word. All other pitch movement in the sentence is simply a result of movement toward the three tonal targets. tallen
allen
words
married
marie
H*
tones
L-L%
300 250 200 150 100 Hz
ms
300
600
900
1200
Figure 12: English declarative intonation How, then, would a native speaker of Mandarin interpret this system?
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
1500
41
WPLS:UR, vol. S2000, no. 1
6. Methodology The English sentences were designed to be grammatically equivalent to the Chinese sentences as much as possible, this seeming to be the best possible way of determining whether similar strategies were used in the two languages or not. Thus, I used basic statements; unmarked, or echo, questions; basic yes-no questions using ÔdidÕ; and yes-no questions with Ôor notÕ added to the end. The words making up the sentences were chosen with an attempt to minimize the number of non-sonorants and to vary the location of the stress. There were six sentences in each category, all of which were basic SVO sentences consisting of three to five words in their statement form. Each was repeated three times for a total of 72 tokens. In addition, a word list was recorded of English words one to four syllables in length with varying locations of stress. The sentences were recorded at the same time and in the same manner as the Chinese sentences were. A male monolingual native speaker of English, age 23, was also recorded simply for the sake of comparison.
7.1. English Results It was found that the Chinese speaker used the same general intonation patterns for the statements and Ôor notÕ questions and for the ÔdidÕ and unmarked questions. As a result, only the intonation patterns of the statements and the ÔdidÕ questions will be analyzed. For the sake of comparison, the pitch track of a native speaker of English for a basic statement, ÔAllen married MarieÕ, and a basic question, ÔDid Allen marry Marie?Õ are displayed.
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
SchackÑComparison of intonation patterns in Mandarin and English
42
tallen
allen
words
married
marie
H*
tones
L-L%
300 250 200 150 100 Hz
ms
300
600
900
1200
1500
didtallen
did
words
allen
marry
marie
L*
tones
H-H%
300 250 200 150 100 Hz
ms
250
500
750
1000
Figure 13: Statement vs. question, English speaker As is typical of short utterances in English, each of these utterances consists simply of a pitch accent, a phrase tone, and a boundary tone. The word ÔAllenÕ is emphasized, making both of these focus constructions and thus aligning the pitch accent with ÔAllenÕ. They are distinguished from one another in opposite choices for pitch accent as well as opposite choices for phrase and boundary tones. The boundary tone distinction gives rise to the well known rising endings for English questions. The same two sentences spoken by the Chinese speaker appear as follows:
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
WPLS:UR, vol. S2000, no. 1
43
allen3
alL+
words tones
-en H+
marL
-ied
mar!L+
-ie H+ L-L%
400 350 300 250 200 Hz
ms
300
600
900
1200
1500
didallen1
did
words tones
alL+
-e(n) marH*+L
-y ma-
-rie L-
H%
400 350 300 250 200 Hz
ms
300
600
900
1200
1500
Figure 13: Statement vs. question, Chinese speaker These two pictures may be taken as typical of the consultantÕs English speech, in that all but two of the statements display this same general pattern as do all the questions. As in English, the statement displays a falling ending while the question displays a rising ending. Although the consultant does apply boundary tones in Chinese, these boundary tones are much more drastic than anything demonstrated in her Chinese speech, and thus she is applying knowledge of an English intonation pattern. However, the way in which she is applying it looks much different than the way a native speaker of English applies this same pattern.
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
SchackÑComparison of intonation patterns in Mandarin and English
44
The rise of the question does not occur until the final syllable, no matter where the prominent syllable in the utterance might be. Thus, if the pitch trace is seen as an interpolation between two tonal targets, it is apparent that the previous tonal target must also occur within either the ultimate or penultimate syllable of the utterance. Moreover, there is generally a much greater amount of pitch movement occurring within the utterance for the Chinese speaker than there is for an English speaker, indicating a denser assignment of tonal targets. Although the patterns shown in the statement above might occur for an English speaker if both ÔAllenÕ an d ÕMarieÕ were stressed in the utterance ÔAllen married MarieÕ, certainly this sort of pattern would not occur in a longer utterance. However, for the Chinese consultant, this pattern becomes even more prominent in longer utterances. ntamy2 words tones
a m y d i d n tm a r r y r y a n L+H*+L !L+H*+L !H*+L-
amy married L+H*+L H*+L
william !L+H*+L-L%
400
350
300
250
200
Hz
ms
700
1400
2100
2800
3500
Figure 14: Longer utterance demonstrating dense assignment of tonal targets, Chinese speaker Not only does she again demonstrate a much greater amount of pitch movement, implying a much denser assignment of tonal targets, but she tends to repeat the same L+H+L pattern, downstepped throughout the utterance, most often applying it to individual lexical items. Throughout the data this L+H+L pattern is consistently applied to every ÒimportantÓ word in an utterance, unless the pattern is overridden by a high boundary tone. Minimally, important words include nouns and some verbs. This is certainly not typical of English intonation. However, it will be demonstrated that this pattern is arguably based on the consultantÕs interpretation of the tonal pattern she assigns to a word in isolation in the same way that lexical tone is interpreted in her native language of Chinese.
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
WPLS:UR, vol. S2000, no. 1
45
In isolation, the consultant consistently applies a rising tone to the stressed syllable and a falling tone to the final syllable of a word. This pattern will be shown for one and two syllable words. A one syllable word will usually receive a rising, then falling tone, although in a few tokens the rise was less than 20 Hz, and thus below the perceptual threshold. The rise is approximately 1/3 to 1/2 the change as is the fall. new3
new
words
400 350 300 250 200 Hz
ms
100
200
300
400
500
name3
name
tones
400 350 300 250 200 150 Hz
ms
100
200
300
400
Figure 15: One syllable words in isolation, Chinese speaker
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
500
SchackÑComparison of intonation patterns in Mandarin and English
46
A two syllable word with initial stress is best fitted to this prosodic pattern; it always contains the rise on the first syllable and fall on the second, with the syllable boundary occurring at the peak. yellow3 words
yel-
-low
400
350
300
250
200
Hz
ms
100
200
300
400
500
Figure 16: Two syllable word, stress initial in isolation, Chinese speaker A two syllable word with ultimate stress presents more difficulty. The first syllable is assigned a flat or slightly falling tone from which the second syllable rises and then falls. This fall is cut short in only one of the two syllable words among the data, ÔuniteÕ; however, it is also the only word ending in a stop, and this is likely to be the cause.
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
WPLS:UR, vol. S2000, no. 1
47
marie2
ma-
words
-rie
400
350
300
250
200
Hz
ms
150
300
450
600
Figure 17: Two syllable word, stress final in isolation, Chinese speaker For a native speaker of English the stress patterns of a word in isolation are lexically determined, and stress plays an important role in assigning intonation in connected speech (Hayes). The intonation tune assigned to a word in isolation, on the other hand, does not carry any lexical significance. Rather, in isolation, a word is assigned one of the English intonation patterns that could be assigned to any other utterance; most significantly, it must consist of a pitch accent, a phrase tone, and a boundary tone. However, in Chinese, the underlying tones are both lexically determined and contrastive, and thus for the Chinese speaker these are important when assigning tonal patterns in connected speech. As a result, connected speech maintains a dense pattern of tonal targets, many of which are derived from the tonal targets associated with a lexical item in isolation. The consultant, in fact, uses the second of these strategies in assigning tunes to connected speech in English, thus providing very different intonation curves than one sees from native speakers of English. Moreover, she assigns these patterns very confidently; in only a few cases did the intonation pattern show any variance among the three utterances of a sentence. Thus, although the consultant is applying knowledge of English intonation patterns to words spoken in isolation, she does not interpret a longer utterance in terms of pitch accent; rather, she repeatedly applies the intonation pattern of a word spoken in isolation to many words occurring within a longer utterance, as she would in Chinese. The only exception is found in the realization of a high boundary tone, probably a result of knowledge of English, which will override this pattern at the end of an utterance.
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
SchackÑComparison of intonation patterns in Mandarin and English
48
8. Concluding Remarks Before making any conclusions it is necessary to reemphasize the scope of this study, which was a case study of one speaker conducted as a preliminary examination of the issues involved. Thus, the patterns displayed cannot be considered normative for either Pang Mandarin or for English as spoken by a native speaker of Chinese. In order to further determine these patterns it would be necessary not only to record more speakers of the Pang variety but also to elicit the information in a more natural manner rather than by having the informant read from a list. Moreover, continued research should attempt to determine the results of stress and examine how these patterns appear in sentences of longer length and of varied lexical tone. These things aside, though, this research definitely emphasizes the vast amount of work left to be done on Chinese. If intonation patterns can vary fundamentally within one dialect over a small geographic area, then certainly statements made about the Chinese spoken in Beijing can hardly be considered to be normative, even though this variety is the prestigious one and taught in schools. This research also demonstrates the power of prosody in speech. In her English speech, the consultant appeared to continue to make use of an intonation system that more closely resembled that of her native language than it resembled English. Not only was the tonal assignment much denser than that of a native English speaker, but the tonal patterns were assigned to lexical items even when the lexical items were within the context of a larger utterance. As a result, this research indicates that it is possible for a speaker to interpret an unfamiliar and distinct intonation system in the same way she interprets the intonation system of her native language. Thus, it not offers evidence for the way in which a specific speaker can carry over specific aspects of her native language to a language that uses very different systems from her own, but it ultimately offers insight into the question how languages interact and affect one another.
Acknowledgements Thanks to: Joyce McDonough, for all her input; the Chinese consultant, who wishes to remain anonymous, for allowing me to analyze her speech; Tim Nyberg, the ÒtypicalÓ English speaker; Patricia Harmon, for her encouragement, food, prayers, and decision that Lattimore was a perfectly nice place to spend many hours of her time; Dan Yee for the same and for his input on the presentation. Hi Mom and Dad! Above all, soli Deo gloria.
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
WPLS:UR, vol. S2000, no. 1
49
Appendix A Chinese Sentences (from Shen, 81-83)
A. Statements 1. Ta1 gao1sheng1 shuo1. he loudly speak
ÔHe speaks loudly.Õ
2. Nian2ji2 school grade
ÔThe school grade makes a ruling.
cai2jue2. make ruling
3. Lao3 gu3dong3 old conservative man
jiang3. ÔThe conservative old man is speaking.Õ speak
4. Bao1shen1gong1 ca1 che1. ÔThe indentured laborer cleans the car.Õ indentured laborer clean vehicle 5. Hong2 Bi2tou2 mei2 quan2. red nose not power
ÔÓRed NoseÓ does not have power.Õ
6. Lao3 shou3zhang3 mai3 jiu3. old senior officer buy wine
ÔThe old senior officer buys wine.Õ
7. Guo4lu4ke4 zhao4xiang4 . passerby take picture
ÔA passerby takes pictures.Õ
B. Unmarked Questions 1. Ta1 gao1sheng1 shuo1? he loudly speak
ÔHe speaks loudly?Õ
2. Nian2ji2 school grade
ÔThe school grade makes a ruling?
cai2jue2? make ruling
3. Lao3 gu3dong3 old conservative man
jiang3?ÔThe conservative old man is speaking?Õ speak
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
SchackÑComparison of intonation patterns in Mandarin and English
50
4. Bao1shen1gong1 ca1 che1? ÔThe indentured laborer cleans the car?Õ indentured laborer clean vehicle 5. Hong2 Bi2tou2 mei2 quan2? red nose not power
ÔÓRed NoseÓ does not have power?Õ
6. Lao3 shou3zhang3 mai3 jiu3? old senior officer buy wine
ÔThe old senior officer buys wine?Õ
7. Guo4lu4ke4 zhao4xiang4 ? passerby take picture
ÔA passerby takes pictures?Õ
C. ÔMaÕ Questions 1. Ta1 gao1sheng1 shuo1 ma? he loudly speak ?-part. 2. Nian2ji2 school grade
ÔDoes he speak loudly?Õ
cai2jue2 ma? ÔDoes the school grade make a ruling? make ruling ?-part.
3. Lao3 gu3dong3 old conservative man
jiang3 ma? ÔIs the conservative old man speaking?Õ speak ?-part.
4. Bao1shen1gong1 ca1 che1 ma? indentured laborer clean vehicle ?-part. ÔDoes the indentured laborer clean the car?Õ 5. Hong2 Bi2tou2 mei2 quan2 ma? red nose not power ?-part.
ÔDoes not ÓRed NoseÓ have power?Õ
6. Lao3 shou3zhang3 mai3 jiu3 ma? ÔDoes the old senior officer buy wine?Õ old senior officer buy wine ?-part. 7. Guo4lu4ke4 zhao4xiang4 ma? passerby take picture ?-part.
ÔDoes a passerby take pictures?Õ
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
WPLS:UR, vol. S2000, no. 1
51
D. A-not-A Questions 1. Bao1shen1gong1 ca1 mei2 ca1 che1? indentured laborer clean not clean vehicle ÔDoes the indentured laborer clean the car, or not?Õ 2. Hong2 Bi2tou2 you 3 mei2 you3 quan2? red nose have not have power ÔDoes ÓRed NoseÓ have power, or not?Õ 3. Lao3 shou3zhang3 mai3 mei3 mai3 jiu3? old senior officer buy not buy wine ÔDoes the old senior officer buy wine, or not?Õ 4. Guo4lu4ke4 zhao4 mei2 zhao4 xiang4? passerby take not take picture ÔDoes a passerby take pictures, or not?Õ
Appendix B English Sentences
A. Statements 1. 2. 3. 4. 5. 6.
Melanie won a new car. Allen married Marie. Mary remembered the alien. Leah will unite the women. A llama is a mammal. Annie made the lemonade.
B. Unmarked Questions 7. Melanie won a new car? 8. Allen married Marie? 9. Mary remembered the alien? 10. Leah will unite the women?
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
SchackÑComparison of intonation patterns in Mandarin and English
11. A llama is a mammal? 12. Annie made the lemonade?
C. ÔDidÕ Questions 13. 14. 15. 16. 17. 18.
Did Melanie win a new car? Did Allen marry Marie? Did Mary remember the alien? Will Leah unite the women? Is a llama a mammal? Did Annie make the lemonade?
D. ÔOr notÕ Questions 19. 20. 21. 22. 23. 24.
Did Melanie win a new car, or not? Did Allen marry Marie, or not? Did Mary remember the alien, or not? Will Leah unite the women, or not? Is a llama a mammal, or not? Did Annie make the lemonade, or not?
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
52
WPLS:UR, vol. S2000, no. 1
53
Appendix C Chinese Words
1. zhen1 ÔneedleÕ 2. an1 ÔsaddleÕ 3. lei2 ÔthunderÕ 4. fen2 ÔgraveÕ 5. lan2 ÔblueÕ 6. tui3 ÔlegÕ 7. yang3 ÔadmireÕ 8. wa3 ÔshingleÕ 9. jian3zi ÔscissorsÕ 10. bi3 ÔbrushÕ 11. xian4 ÔthreadÕ 12. jin4 ÔenterÕ 13. hong2 ÔrainbowÕ 14. dui4zi ÔrightÕ 15. shan4zi ÔfanÕ 16. mei4zi Ôyounger sisterÕ 17. bing4 ÔdiseaseÕ 18. fan4 ÔmealÕ 19. jin4 ÔnearÕ 20. gui4zhao2 ÔbutterflyÕ 21. bi4zi Ôfine tooth combÕ 22. dou4zi ÔbeanÕ 23. tu4zi ÔrabbitÕ 24. mao4zi ÔhatÕ 25. ma1 ÔmotherÕ 26. ma2 ÔhempÕ 27. ma3 ÔhorseÕ 28. ma4 ÔscoldÕ 29. ma question particle 30. na2 ÔcarryÕ 31. na3 ÔwhichÕ 32. na4 ÔstammerÕ 33. ren2 ÔpersonÕ 34. ba2 Ôpull outÕ 35. ba3 ÔbridleÕ
36. 37. 38. 39. 40. 41. 42. 43. 44. 45.
ba4 la1 la3 la4 wa1 mai3 mai2 mai4 wa2 wa4zi
ÔdamÕ ÔgarbageÕ ÔtrumpetÕ ÔspicyÕ ÔfrogÕ ÔbuyÕ ÔburyÕ ÔsellÕ ÔbabyÕ ÔsocksÕ
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
SchackÑComparison of intonation patterns in Mandarin and English
54
Appendix D English Words
1. 2. 3. 4. 5. 6. 7. 8.
Marie lemonade. new melon unite alien alumni remember
9. mole 10. yellow 11. animal 12. malaria 13. meal 14. name 15. long 16. llama 17. banana
References: Beckman, Mary. Stress and Non-Stress Acent. Dordrecht: Foris, 1986. Beckman, Mary E., and Gayle M. Ayers. Guidelines for ToBI Labelling (version 1.5). The Ohio State University Research Foundation, 1993. Beckman, Mary E., and Janet Pierrehumbert. ÒIntonational Structure in Japanese and English.Ó Phonology Yearbook 3 (1986): 255-310. Chao, Yuen Ren. A Grammar of Spoken Chinese. Berkeley: University of California Press, 1968. Garding, Eva. ÒChinese and Swedish in a generative model of intonation.Ó Nordic Prosody III, Papers from a Symposium. Ed. Clais-Christian Elert et al. Stockholm: Almqvist and Wiksell, 1984. 79-91. Giet, Franz. ÒPhonetics of North-China Dialects.Ó Monumenta Serica 11 (1946): 233-268. Giet, Franz. Zur Tonitaet Nordchinesischer Mundarten. Vol. 2 of Studia Instituti Anthropos. 2 ed. Vienna: Missionsdruckerei St. Gabriel, 1950. 184.
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
55
WPLS:UR, vol. S2000, no. 1
Godjevac, Svetlana. An Autosegmental/Metrical Analysis of Serbo-Croatian Intonation. Aug. 1999. Ohio State U. May 2000 Hayes, Bruce. Metrical Stress Theory. University of Chicago, 1995. Hirst, Daniel, and Albert Di Cristo, ed. Intonation Systems: A Survey of Twenty Languages. Cambridge: Cambridge University Press, 1998. Ho, Aichen T. ÒIntonation Variation in a Mandarin Sentence for Three Expressions: Interrogative, Exclamatory and Declarative.Ó Phonetica 34 (1977): 446-457. Kratochvil, Paul. ÒIntonation in Beijing Chinese.Ó Intonation Systems: A Survey of Twenty Languages. Ed. Daniel Hirst and Albert Di Cristo. Cambridge: Cambridge University Press, 1998. 417-431. Ladd, D. Robert. Intonational Phonology. Cambridge Studies in Linguistics. New York: Cambridge University Press, 1996. Peng, Shu-hui. Ohio State U. August 1999 Pierrehumbert, Janet B. ÒThe Phonology and Phonetics of English Intonation.Ó Unpublished Ph. D. dissertation. MIT, 1980. Shen, Xiao-nan Susan. The Prosody of Mandarin Chinese. Vol. 118 of University of California Publications in Linguistics. Berkeley: University of California Press, 1990. 95. Xu, Yi. ÒEffects of tone and focus on the formation and alignment of F0 contours.Ó Journal of Phonetics 27 (1999): 55-105.
http://www.ling.rochester.edu/wpls/s2000n1/schack.pdf
University of Rochester Working Papers in the Language SciencesÑVol. Spring 2000, no. 1 Katherine M. Crosswhite and Joyce McDonough (eds.)
Simple Recurrent Networks and Competition Effects in Spoken Word Recognition James S. Magnuson (
[email protected]) Michael K. Tanenhaus (
[email protected]) Richard N. Aslin (
[email protected]) Department of Brain and Cognitive Sciences University of Rochester, Meliora Hall, Rochester, NY 14627 USA
Abstract Continuous mapping models of spoken word recognition such as TRACE (McClelland and Elman, 1986) make robust predictions about a wide variety of phenomena. However, most of these models are interactive activation models with preset weights, and do not provide an account of learning. Simple recurrent networks (SRNs, e.g., Elman, 1990) are continuous mapping models that can process sequential patterns and learn representations, and thus may provide an alternative to TRACE. However, it has been suggested that the features that allow SRNs to learn temporal dependencies lead them to work much like the Cohort model (e.g., Marslen-Wilson, 1987), such that items are activated by onset similarity to an input, but not by offset similarity (Norris, 1990). This would make them incompatible with TRACE and with recent results indicating that words that rhyme compete during spoken word recognition (Allopenna, Magnuson and Tanenhaus, 1998). We present simulations demonstrating that rhyme effects do emerge in SRNs, but this depends on how the training is carried out. We also find that SRN predictions provide a good fit to a series of recent studies of the time course of competition effects in spoken word recognition, including cohort, rhyme, and neighborhood density effects.
Introduction Continuous mapping models of spoken word recognition (so called because lexical items are activated continuously as a function of their similarity to an input stimulus, without explicit consideration of word boundaries) such as TRACE (McClelland and Elman, 1986) have proven to be robust models of a wide range of spoken word recognition phenomena. However, most continuous mapping models are interactive activation models, in which the weights of connections
http://www.ling.rochester.edu/wpls/s2000n1/magnuson.etal.pdf
57
WPLS:UR, vol S2000, no. 1
between units are preset on the basis of theoretical assumptions. While TRACE, for example, can be criticized as unrealistic in several respects (see Norris, 1994), we find the largest draw-back of interactive activation models to be their obvious inability to model learning and development. Simple recurrent networks (SRNs) are another sort of continuous mapping model that also learn. SRNs are similar to standard feed-forward networks, but have an added component: a set of context units that contain a copy of the hidden unit activations from the previous time step (the SRN architecture is described in more detail below). This feature allows SRNs to learn a broad range of sequentially-dependent inputs. Norris (1990) reported that, as one might expect given sequential stimuli, SRNs show a Òleft-to-rightÓ bias: in NorrisÕ simulations, words that overlapped at onset with an input became active, but words which overlapped at offset did not. Such performance would be consistent with the Cohort model (e.g., Marslen-Wilson, 1987), in which a ÒcohortÓ of possible matches to an input is winnowed down to a unique match by removing items as they mismatch with the input, in a left-toright manner. Because onsets come first, the Cohort model predicts a large bias towards activation of items sharing onsets, and against activation of items mismatching at onset, even given later overlap. This means that rhymes, such as beaker and speaker, are not predicted to activate one another. TRACE, on the other hand, predicts that rhymes should compete, which is consistent with recent empirical work: Allopenna, Magnuson and Tanenhaus (1998) reported evidence of robust rhyme competition during spoken word recognition. We are concerned about this reported discrepancy between SRNs and TRACE because the prediction of rhyme effects provides a critical distinction between continuous mapping models and ÒalignmentÓ models like Cohort (socalled because of their emphasis on finding or assuming word boundaries). In the next section, we discuss the differences between continuous mapping and alignment models with respect to rhyme competition, and briefly review the empirical evidence for rhyme competition. Then, we present SRN simulations using NorrisÕ (1990) materials, followed by simulations of the results of Allopenna et al., as well as more recent work by Magnuson and colleagues (Magnuson, Dahan, Allopenna, Tanenhaus, & Aslin, 1998; Magnuson, Tanenhaus, Aslin & Dahan, 1999). Cohort and Rhyme Effects and Models of Spoken Word Recognition Theories of spoken word recognition agree on a broad set of basic assumptions: given a spoken word, multiple lexical candidates are activated and compete for recognition. The degree to which items become active depends on their similarity to the input, as well as other characteristics (e.g., their frequency of occurrence).
http://www.ling.rochester.edu/wpls/s2000n1/magnuson.etal.pdf
Magnuson et al.ÑSimple Recurrent Networks and Competition Effects
58
Where models tend to differ is in the set of candidate words predicted to become active. One division that can be made is between alignment and continuous mapping (or continuous activation) models. Alignment models (e.g., Cohort: Marslen-Wilson, 1987; and Shortlist: Norris, 1994) postulate mechanisms which actively seek (or assume) word boundaries. In the Cohort model, candidates are evaluated as to how well they match an input word beginning from word onset. Activations are greatly reduced given mismatches between input and candidate. Continuous mapping models give no special consideration to word onsets. Instead, items become active as a function of their moment-to-moment similarity to the input, with no explicit penalty for mismatches. The term continuous mapping is potentially confusing. It does not simply mean the model continuously provides an output. For example, TRACE is a continuous mapping model, but effectively becomes an alignment model when its explicit end-of-word Òsilence phonemeÓ is used to mark word boundaries.1 Similarly, while the interactive activation and competition decision level of Shortlist provides continuous output, Shortlist is very much an alignment model, since mismatches are explicitly penalized based on aligning a candidate word with a known word boundary. One might expect that explicitly searching for word boundaries would be an efficient or even optimal strategy. But consider the variability we experience in using spoken language. We recognize speech in countless circumstances where the acoustics of speech vary tremendously: outdoors, in stairwells, with different talkers, who might have different accents, or who might have just taken a bite out of a hamburger. A recognition mechanism optimized for clear speech (where word boundaries will still be difficult to find) may spend most of its time reanalyzing mis-segmented speech. A system which does not tie itself to word boundaries might prove more robust, since a wider range of possible matches to the input will be considered. One result of the differences between continuous mapping and alignment models is a contrast in whether or not rhymes are predicted to compete. Both types of model predict that words sharing onsets will compete. Alignment models, because of the emphasis on mismatches, predict that candidates that mismatch at onset will compete weakly only if the initial mismatch is small (with evidence suggesting the mismatch can be no larger than one or two features; e.g., Connine, Blasko & Titone, 1993). However, because activation in continuous mapping models depends on overall similarity with no privilege given to onset or offset, rhymes are predicted to compete (although, e.g., in TRACE, they will be less strongly
1
__________ TRACE uses a brute force approach to solve the alignment problem: it employs multiple detectors for each word, aligned with different temporal positions.
http://www.ling.rochester.edu/wpls/s2000n1/magnuson.etal.pdf
59
WPLS:UR, vol S2000, no. 1
activated than words sharing onsets, since onset competitors will inhibit the rhymes before the input begins to overlap with the rhymes). Until recently, the empirical record favored alignment models; evidence for rhyme activation was weak at best. However, Allopenna, Magnuson and Tanenhaus (1998) reported robust rhyme effects using the recently developed Òvisual world paradigmÓ (e.g., Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy, 1995). In this paradigm, participants respond to spoken instructions to move objects in a visual display, and their eye fixations are measured continuously. Fixations turn out to be tightly time-locked to speech -- at least given a task in which visually guided movements are required (which avoids problems of interpretation raised by Viviani, 1990, since the eye movements have a functional interpretation; see Allopenna et al. for more discussion). Fixations to objects whose names are similar to an input word begin as early as 200 ms after the onset of the input word. In very simple tasks, participants require approximately 150 msec to plan and launch a saccade (e.g., Matin, Shao, & Boff, 1993). Allowing for this planning time, the earliest eye movements are being planned approximately 100 msec after target onset. Thus, these fixations are indeed closely time-locked to the speech (compare them to minimal response times of about 200 ms after the offset of monosyllabic words in lexical decision). Figure 1 shows the TRACE predictions and observed fixation patterns for critical trials from Allopenna et al. (1998). On most trials, subjects were asked to Òclick onÓ a target item (using the computerÕs mouse) displayed with three unrelated items. On critical trials, onset competitors (referred to as ÒcohortsÓ because they are the items predicted to compete in the Cohort model) and/or 1.0
Referent (e.g., "beaker") Cohort (e.g., "beetle") Rhyme (e.g., "speaker") Unrelated (e.g., "carriage")
0.8
0.6
0.4
0.2
0.0
Referent (e.g., "beaker") Cohort (e.g., "beetle") Rhyme (e.g., "speaker") Unrelated (e.g., "carriage")
0.8 Fixation probability
Predicted fixation probability
1.0
0.6 Average target offset 0.4
0.2
0
200 400 600 800 Time since target onset (scaled to msec)
1000
0.0
0
200 400 600 800 Time since target onset (msec)
1000
Figure 1: TRACE activations converted to predicted fixation probabilities (left panel) and observed probabilities of fixating a target referent, a cohort member, a rhyme, and an unrelated object (right panel) from Allopenna et al. (1998).
http://www.ling.rochester.edu/wpls/s2000n1/magnuson.etal.pdf
Magnuson et al.ÑSimple Recurrent Networks and Competition Effects
60
rhyme competitors were present. TRACE activations were transformed into predicted fixation probabilities using a variant of the Luce choice rule (see Allopenna et al.). As TRACE predicts (TRACE accounts for greater than 90% of the variance in each of the critical items), the data indicate that items compete for recognition as a function of their similarity to a stimulus over time, and even substantial initial mismatches do not block rhyme activation (since all of the rhymes differed by more than two features). What do SRNs predict? Norris (1990) reported that the performance of SRNs is consistent with the Cohort model, since he found evidence of cohort (onset) competition, but not offset competition. We will begin our examination of SRNs by training an SRN to recognize the words in NorrisÕ (1990) training lexicon. Simulation 1: Norris (1990) The basis for NorrisÕ claim was a simulation using a 48-word lexicon consisting of phonemic transcriptions of 24 real words, plus 24 non-words created by reversing the transcriptions of the real words. This meant that for each pair overlapping at onset, there was a corresponding pair overlapping in the same segments (in reverse order) at offset. The 24 real words were baker, beat, boot, border, bounded, calm, cold, coroner, coronet, damp, delimit, deliver, dish, disk, door, fear, finish, flash, heap, hurt, pound, taker, trash, and tripe. We used phonemic transcriptions of these, and reversals of the transcriptions, as our lexicon. Our phonemic transcriptions differed from NorrisÕ. His were coded phoneme-by-phoneme, using a set of 11 features. Ours were coded similarly, but using a set of 18 features derived from OÕGrady et al. (1989). Otherwise, our simulations were quite similar. Figure 2 shows a schematic of the SRN we used. Before describing the simulation further, we will briefly describe the network we used and some general properties of SRNs. SRNs are nearly identical to standard feed-forward networks trained using backpropagation. The innovation that makes them good candidates for learning sequential patterns is the use of a context layer. In Figure 2, solid arrows between layers indicate full trainable connectivity (each unit in the lower layer has a weighted connection to each unit in the upper layer, and the weights on those connections can be modified during training). The dashed arrow indicates nonmodifiable Òcopy-backÓ connections between the hidden layer and the context layer. At each time step, part of the input to the hidden layer is from the context layer. Context layer activation is a direct, one-to-one copy of the hidden layer activation from the previous time step. This allows the network to react not just to the current input, but also to its own state at the previous time step, t-1 -- and its state in multiple preceding time steps, since its hidden layer activation at the
http://www.ling.rochester.edu/wpls/s2000n1/magnuson.etal.pdf
61
WPLS:UR, vol S2000, no. 1
previous time step would have been influenced by its input from the context layer containing copies of hidden unit activations at time t-2, and so on. Initially, all trainable weights are set to small, random values. The weights are then modified as each input is presented using backpropagation. Activation from one layer is passed through weighted, trainable connections to the next layer; input and context activations are passed to the hidden layer, and hidden unit activations are passed through weighted, trainable connections to the output units. Output error is computed for each output unit as the difference between a desired output and the actual output. Hidden-to-output weights are changed according to how much of the error was contributed by each weighted connection. Error is propagated back to the hidden layer by assigning each hidden unit a proportion of responsibility for the output error, and changing the incoming weights from the input and context layers accordingly. For the current simulation using NorrisÕ word list, we proceeded as follows. The network consisted of 18 input units (one for each phonetic feature), 20 hidden and context units, and 48 outputs (one for each lexical item, using a localist representation). Bias units were used for both the hidden and output units, and bias activation was always set to 1. The network was trained for many epochs, with a learning rate of .05. At each epoch, the list of 48 items was randomly ordered. Then each item was presented phoneme-by-phoneme. The networkÕs task at each time step was to indicate the lexical item that was being presented by activating that wordÕs localist output unit, and setting all other lexical units to zero. Context activations were not reset to 0 between words, as is sometimes done with SRNs. Resetting the context weights would effectively make the SRN an alignment model, since an explicit cue to word boundaries would be given. As in NorrisÕ simulation, we found little co-activation at offset between the reversed cohort pairs which overlapped only in one or two final phonemes. (Note Output units (1 per lexical item)
Hidden units
Input units Context units (18 phonetic-feature (exact copy of hidden "phonemes") units at time t-1) Figure 2: Schematic of SRNs used. Solid lines indicate fully connected, trainable, weighted connections. The dotted line indicates an exact 1-to-1 copy from each hidden unit to a corresponding context unit.
http://www.ling.rochester.edu/wpls/s2000n1/magnuson.etal.pdf
Magnuson et al.ÑSimple Recurrent Networks and Competition Effects
1.0
1.0 bekR tekR tRh
0.6 0.4
0.8 Activation
Activation
0.8
0.2
b
e k Phoneme
0.4
t
e k Phoneme
1.0
0.6 ksId SId tRh
0.4 0.2
R
SId SInIf ksId
0.8 Activation
0.8 Activation
0.6
0.0
R
1.0
0.6 0.4 0.2
0.0
k
s I Phoneme
1.0
0.0
d
S
I Phoneme
d
1.0
RnXrok tEnXrok tRh
0.8 Activation
0.8 Activation
tekR bekR tRh
0.2
0.0
0.6 0.4
tEnXrok RnXrok tRh
0.6 0.4 0.2
0.2 0.0
62
R
n
X r Phoneme
o
k
0.0
t
E
n X r Phoneme
o
k
Figure 3: Rhyme effects using a variant of NorrisÕ (1990) lexicon. The three most active items are shown in each case. The top two panels show the effects for baker and taker. The middle panels shows the effects for ksid and hsid. Note the cohort effect for /SId/. The bottom panels show the effects for ronoroc and tenoroc.
that for now, we will talk about co-activation and not competition; later, we will discuss whether activations in these simulations indicate competition.) However, other models, such as TRACE, would not predict much competition between these items, either, because they overlap so little.
http://www.ling.rochester.edu/wpls/s2000n1/magnuson.etal.pdf
63
WPLS:UR, vol S2000, no. 1
If we consider items with more complete rhymes, the results are quite different. Of the 48 items, there were 7 rhyme pairs (given in their orthographic forms here): baker/taker, renoroc/tenoroc (coroner/coronet reversed), reviled/timiled (delimit reversed), dish/finish, hsid/ksid (dish/disk reversed), hsinif/raef (finish/fear reversed), and flash/trash. We examined the performance of the network after 10,000 epochs. By this point, the most activated word unit was always the correct item by the last phoneme. Strong rhyme co-activation was observed for three of the pairs after 10,000 epochs of training (baker/taker, renoroc/tenoroc, and hsid/ksid), and weak activation was observed for trash/flash, dish/finish. The two pairs which did not show even weak co-activation overlapped only slightly in the last syllable, and so the lack of activation is not surprising. Also, there were co-activation effects for these items earlier in training, with rhymes more active than unrelated items. However, prior to the 10,000 epoch mark, not all items were being Òcorrectly identifiedÓ by the last phoneme. We will return to this in the discussion section. The results for the three strong rhyme pairs after 10,000 epochs of training are presented in Figure 3. There is an asymmetry in each of the pairs in Figure 3. This is correlated with the density for the initial two segments in the items. For example, while there were more words beginning with /t/ than with /b/ (8 vs. 5) there were 7 initial CV or CC sequences beginning with /t/, as compared to 4 for /b/. Only one /t/-word overlapped by more than one onset phoneme with another word. The transitional probability for /te/ was .125, and .2 for /be/. It appears that with this lexicon, the network has learned to minimize error by Òreserving judgmentÓ for some initial sequences. There is less error associated with no response than with activating all members of a large onset cohort when the cohort can be narrowed significantly by waiting for one more segment. For the other pairs, the asymmetry is due to the misalignment of the rhymes; in each case, the item with an initial disadvantage has one more segment preceding the overlapping offset. That we still find rhyme effects given this misalignment bodes well for handling the rhyme effects Allopenna et al. (1998) reported. Not only do we find rhyme activation for items differing by more than one feature at onset, we find it for mis-aligned rhymes similar to those among the Allopenna et al. stimuli (e.g., beaker/speaker). Why did we find substantial rhyme co-activation when Norris reported no coactivation due to offset overlap? Some of the examples he presented had rhymes, but in his simulation, rhymes were never active. The differences between our simulations might have been due to learning rate, our phonological representations (ours were richer than those used by Norris), or the amount of training. The most likely explanation is that the amount of learning Norris allowed before examining the SRNÕs performance led to the elimination of rhyme effects. We replicated our simulation, but accelerated learning with a learning rate of .5. After 200 epochs,
http://www.ling.rochester.edu/wpls/s2000n1/magnuson.etal.pdf
Magnuson et al.ÑSimple Recurrent Networks and Competition Effects
64
the SRNÕs output was similar to that of our original simulation, albeit somewhat noisier. However, by 1000 epochs, rhyme effects disappeared, presumably because the model learned to give more weight to context activations for rhymes, and cohort co-activations nearly mirrored transitional probabilities. This means the SRN had learned the lexicon nearly perfectly, which we will argue later provides a poor analog to the human language processor. Simulation 2: Allopenna et al. (1998) Simulation 1 demonstrated that SRNs do predict rhyme activation under certain circumstances. We now turn to the question of how well those predictions match human data, specifically the data from Allopenna et al. (1998) shown in Figure 1. We used an SRN similar to the one described for the previous simulation, except that it had 23 localist outputs (one for each possible response), 40 hidden and context units, and we used a learning rate of .1.2 The items we used were phonemic transcriptions of the words beaker, beetle, speaker, carrot, carriage, parrot, candle, candy, handle, pickle, picture, nickel, casket, castle, basket, paddle, padlock, saddle, dollar, dolphin, collar, sandal and sandwich. The training procedure was identical to that for the previous simulation. For each epoch of training, the words were randomly ordered, and then presented phoneme-byphoneme (using the 18-feature vector representation) to the SRN. The desired output was the current word, and context unit activations were not reset between words. We chose to examine the model after 1500 epochs of training, because at that point, the correct output node was always the most highly active by the last phoneme of each word, but rhyme and cohort effects were still present. In order to compare the modelÕs performance to the data in Figure 1, we chose all of the target-cohort-rhyme sets in which the target was four phonemes in length (five of the eight sets, with the targets beaker, dollar, pickle, paddle, and sandal). Nearly identical effects were found for the other targets (carrot and candle, of length 5, and casket, of length 6), but we restricted our analyses to 4-phoneme targets because it is not clear how responses to phonemes of different lengths should be combined. We averaged cohort and rhyme conditions for each of the 4-phoneme targets. The average output is shown in the top panel of Figure 4. Target, cohort, and rhyme activations represent the averages across all 4-phoneme sets. The unrelated activation is the maximum value found at each phoneme from any set. 2
__________ Note that the a wide range of parameters (number of hidden and context units, learning rate, and training epochs) lead to the same result (as evidenced by our successful replication of Simulation 1 with a larger learning rate and smaller number of trianing epochs). For training sets like the one used for this simulation, increasing the number of hidden units allows a smaller learning rate to arrive at the desired performance threshold (ÒrecognitionÓ of targets, i.e., target units having the highest activation at word offset).
http://www.ling.rochester.edu/wpls/s2000n1/magnuson.etal.pdf
65
WPLS:UR, vol S2000, no. 1
A weakness of the current input representations is that entire phonemes are presented in a single time-step. An input representation which allowed more continuous input presentation would clearly provide a better comparison to the human data. In order to compare the current simulation output to the data, we used linear interpolation and extrapolation to fit the simulation output to the data. There were 30 frames of human data (each frame corresponding to a 33 msec video frame). In order to stretch our four simulation data points, we aligned point 1 to the fifth frame of the human data, which was the frame before any of the fixation probabilities were greater than .01. Then, 5 frames were inserted between each simulation data point. This took us intentionally to frame 20. At the last simulation data point, the rhyme activation has decreased from its peak value. Frame 20 corresponds to a similar point in the human data. From frame 21 to frame 30, we assumed the target probability should rise to 1, and the other values should decrease to 0, as is true for the human data. We then computed correlations between the interpolated simulation predictions and the human data. The r2 values for the targets, cohorts and rhymes were .87, .93 and .81, respectively. The interpolated simulation output is presented in the lower panel of Figure 4. This simulation demonstrates that not only can SRNs predict rhyme effects, the predictions map quite well onto human data. At the same time, we do not wish to present this as an adequate model of spoken word recognition. Better input representations are needed, and the lexicon for this simulation was rather odd, given that every item had a cohort and/or rhyme. A replication with a more naturalistic lexicon is needed, and we are currently working on assembling such a lexicon. However, the results are still quite promising, and we need not alter SRNs 1.0
0.7
Target Cohort Rhyme Unrelated
0.9 0.8 Interpolated activation
0.8
Activation
1.0
Target Cohort Rhyme Max. unrelated
0.9
0.6 0.5 0.4 0.3 0.2 0.1
0.7 0.6 0.5 0.4 0.3 0.2 0.1
0.0 0
1
2 3 Phoneme
4
0.0 33
200
367 533 700 867 Msec since target onset
Figure 4: Simulations of the cohort and rhyme effects from Allopenna et al. (1998). activations. Bottom: activations interpolated for comparison with the data.
http://www.ling.rochester.edu/wpls/s2000n1/magnuson.etal.pdf
Top:
Magnuson et al.ÑSimple Recurrent Networks and Competition Effects
66
in order to provide a coarse account of rhyme effects. Simulation 3: Magnuson et al. (1998, 1999) Magnuson et al. (1998, 1999) replicated and extended the Allopenna et al. (1998) results. In order to have precise control over distributional characteristics of the input, we constructed an artificial lexicon with very specific properties. The lexicon consisted of 16 words. The novel words were randomly mapped onto distinct, novel geometric objects. Subjects learned the names of the objects over two days of training (either two or four objects would be displayed, and the subject would hear an instruction such as, Òclick on the pibuÓ; after the subject
1.0
Target Cohort Rhyme Unrelated
HF/hf target HF/hf unrelated HF/lf target HF/lf unrelated
0.8
Fixation probabilty
0.8
Fixation probability
"Absent" competitors
Basic pattern
1.0
0.6
0.4
0.2
0.6
0.4
0.2
0.0 0
250
500 750 1000 1250 Time since target onset (msec)
0.0
1500
0
Target (e.g., pibo) Cohort (e.g., pibu) Unrelated (e.g., bapu)
0.8
Average target offset
Fixation probability
Fixation probability
0.8
0.4
1500
1.0
Target (e.g., pibo) Cohort (e.g., pibu) Unrelated (e.g., bapu)
0.6
500 750 1000 1250 Time since target onset (msec)
Target frq. < comp. frq.
Target frq. = comp. frq.
1.0
250
0.2
0.6 Average target offset 0.4
0.2
0.0
0.0 0
150
300
450 600 750 900 1050 1200 1350 1500 Time since target onset (ms)
0
150
300
450
600
750
900 1050 1200 1350 1500
Time since target onset(ms)
Figure 5: Major trends from artificial lexicon studies (Magnuson et al., 1998, 1999).
http://www.ling.rochester.edu/wpls/s2000n1/magnuson.etal.pdf
67
WPLS:UR, vol S2000, no. 1
clicked on one object, feedback was given by removing all the incorrect choices from the display and repeating the target name). The lexicon could be divided into four sets of four words. For example, one set was /pibu/, /pibo/, /dibu/, and /dibo/. Each item has one cohort (/pibu/ and /pibo/, /dibu/ and /dibo/) and one rhyme (/pibu/ and /dibu/, /pibo/ and /dibo/). The real advantage of these subsets was the frequency manipulation they allowed. For example, if /pibu/ and /dibo/ (which were not predicted to compete significantly) were presented with high frequency in the learning phase, and /pibo/ and /dibu/ were low-frequency, we would have two different frequency conditions: highfrequency items with low-frequency competitors, and vice-versa. In Magnuson et al. (1998), items were either high- or low-frequency, such that there were four target/competitor frequency conditions: high/high, low/low, high/low and low/high. In Magnuson et al. (1999), a third, ÒmediumÓ level of frequency was added, which allowed a crucial test. On some trials, high-frequency targets which had either high- or low-frequency competitors were presented among three unrelated, medium-frequency distractors. If competition effects in the paradigm were driven only by the characteristics of items displayed on a given trial, there should have been no difference in the fixation probabilities to these items, since their high- or low-frequency competitors were absent from the display. However, we found robust differences; fixation probabilities rose much more slowly for high-frequency targets with high-frequency distractors than for high-frequency targets with low-frequency distractors. The major trends from these experiments are shown in Figure 5. In the top left panel, target, cohort, rhyme, and unrelated probabilities averaged over all conditions are shown. In the top right panel, target and average unrelated distractor probabilities are shown from the Òabsent competitorÓ condition just described. The lower panels show cohort effects when (bottom left) target and competitor frequencies are equal and (bottom right) target frequency is less than competitor frequency. These results provided the first fine-grained measures of the time course of activation and competition among items as functions of phonological similarity and experience (see Dahan, Magnuson and Tanenhaus, submitted, for similar results using real words). To verify that an SRN would capture the major trends of these studies, we conducted a simulation with the same 18-feature input units from Simulations 1 and 2, 20 hidden and context units, 16-localist lexical outputs, and a learning rate of .01. High-frequency items were presented twice as often as low-frequency items. A small high/low ratio, a small learning rate, and many training epochs (528,000) were required to fit the data (although again, a range of parameters
http://www.ling.rochester.edu/wpls/s2000n1/magnuson.etal.pdf
Magnuson et al.ÑSimple Recurrent Networks and Competition Effects
Basic pattern
1.0
1.0
Target Cohort Rhyme Unrelated
0.9 0.8
HF target, HF competitors HF target, LF competitors
0.8 0.7
Activation
Activation
"Absent" competitors
0.9
0.7 0.6 0.5 0.4
0.6 0.5 0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.0 0
1
2
3
0.0 0
4
1
Ph oneme
1.0
4
Target frq. < comp. frq. Target Cohort Rhyme Unrelated
0.9 0.8 0.7
Activation
0.7 0.6 0.5 0.4
0.6 0.5 0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.0 0
3
1.0
Target Cohort Rhyme Unrelated
0.8
2
Ph o neme
Target frq. = comp. frq.
0.9
A ctivation
68
1
2
3
Phoneme
4
0.0 0
1
2
3
4
Ph oneme
Figure 6: Simulations of the major trends from the artificial lexicon studies.
would provide good fits). The simulation results are shown in Figure 6. For all four panels, activations are based on simulations using the entire lexicon. Rather than using a variant of the Luce choice rule, as Allopenna et al. (1998) did, to capture the constraints of the subjectsÕ task, we present these results as evidence that the SRN provides a basis for the major trends of the artificial lexicon studies. Discussion The simulations described here show that rhyme effects are predicted by SRNs, but that the amount of co-activation observed depends on training parameters and the similarity density at each segment. Among training parameters, the most
http://www.ling.rochester.edu/wpls/s2000n1/magnuson.etal.pdf
69
WPLS:UR, vol S2000, no. 1
important one would appear simply to be the amount of training; we replicated Simulation 1 even after increasing the learning rate by an order of magnitude. Whether or not we observed rhyme effects depended on when training was stopped. If an SRN is trained until virtually no changes occur in connection weights, and if it has a sufficient number of hidden and context units to represent the temporal dependencies of the input, its outputs will mirror the statistics of the lexicon perfectly. This does not provide a good analog to the human language processor. There are obviously many differences between the learning situations of our SRNs and a human learner. One is that our SRN always received an input of perfect fidelity (with the exception of context activations at word onsets, which will contain arbitrary information about the ending of the preceding, randomly selected word), whereas the human must adapt to differences in acoustics, talker, dialect, and rate, among others. One coarse way to approximate this is to end the learning phase for the SRN before it has learned its input perfectly, i.e., while its representations are still noisy. A simple way to improve our inputs might be to add noise to them, or to simulate coarticulation by allowing adjacent phonemic representations to blend with one another. A better alternative would be to use more realistic input. The phonetic features used here could be replaced by pseudo-acoustic representations, like those used for TRACE, or those developed by Plaut and Kello (1999), although noise to simulate natural variability would still be called for. However, the current results do show that cohort, rhyme, frequency, and neighborhood density effects can be expected from generic SRNs, which paves the way to further explorations of SRNs as models of spoken word recognition. One issue that must be explored is the role of competition at the output level. McQueen and his colleagues (e.g., McQueen, Cutler, Briscoe & Norris, 1995), among others, have argued that competition is required to model some phenomena in spoken word recognition (e.g., the temporal dynamics of processing embedded words). It is possible that an explicit competition mechanism would not be required for a trained SRN to model these phenomena. Although SRNs are often described as purely bottom up processing models (e.g., Cairns, Shillcock, Chater & Levy, 1997), there are two respects in which they are top-down. The first is that hidden unit activations are passed recurrently through the network via the context units. The context unit activations are not a bottom-up source of information, in that they are directly related to the state and output of the network at the previous time step. Second, learning in an SRN is driven in a top-down fashion. Weights are updated based on an error signal propagated back from an explicit comparison of the output to a desired output. Two co-active output units effectively reflect competition among subsets of the weighted connections for the reward of weight increases. It is possible that this
http://www.ling.rochester.edu/wpls/s2000n1/magnuson.etal.pdf
Magnuson et al.ÑSimple Recurrent Networks and Competition Effects
70
competition during learning may produce a processing system which is functionally equivalent to one feeding into an explicit competition mechanism. While reports of SRNs used to examine, e.g., embedded word effects (Davis, Gaskell, & Marslen-Wilson, 1997) do not support this claim, this may be due to specific training parameters and lexical characteristics used. We are currently performing lexical analyses of English with the goal of assembling a reasonably small lexicon with a good approximation of the lexical statistics of English. Such a ÒreasonablyÓ small approximation to a real lexicon will allow simulations to be carried out realistically quickly, with a minimum of effects due to unexpected or unrealistic lexical characteristics. Acknowledgments Supported by NIH HD27206 to MKT, NSF SBR-9729095 to MKT and RNA, and an NSF GRF to JSM. References Allopenna, P. D., Magnuson, J. S., & Tanenhaus, M. K. (1998). Tracking the time course of spoken word recognition using eye movements: Evidence for continuous mapping models. Journal of Memory and Language, 38, 419-439. Cairns, P., Shillcock, R., Chater, N., & Levy, J. (1997). Bootstrapping word boundaries: A bottom-up corpus-based approach to speech segmentation. Cognitive Psychology, 33, 111-153. Connine, C.M., Blasko, D. G., & Titone, D. (1993). Do the beginnings of spoken words have a special status in auditory word recognition? Journal of Memory and Language, 32, 193-210. Davis, M. H., Gaskell, M. G., & Marslen-Wilson, W. (1997). Recognising embedded words in connected speech: Context and competition. In J. Bullinaria & G. Houghton (Eds.), Proceedings of the 4th Neural Computation in Psychology Workshop. Elman, J.L. (1990). Finding structure in time. Cognitive Science, 14, 179-211. Magnuson, J. S., Dahan, D., Allopenna, P. D., Tanenhaus, M. K., & Aslin, R. N. (1998). Using an artificial lexicon and eye movements to examine the development and microstructure of lexcical dynamics. In Gernsbacher, M. A., and Derry, S. J. (Eds.), Proceedings of the Twentieth Annual Conference of the Cognitive Science Society, 651-656. Magnuson, J. S., Tanenhaus, M. K., Aslin, R. N., & Dahan, D. (1999). Spoken word recognition in the visual world paradigm reflects the structure of the entire
http://www.ling.rochester.edu/wpls/s2000n1/magnuson.etal.pdf
71
WPLS:UR, vol S2000, no. 1
lexicon. Proceedings of the Twenty-first Annual Conference of the Cognitive Science Society, 331-336. Marslen-Wilson, W. (1987). Functional parallelism in spoken word recognition. Cognition, 25, 71-102. Matin, E., Shao, K. C., and Boff, K. R. (1993). Saccadic overhead: Informationprocessing time with and without saccades. Perception & Psychophysics, 53, 372-380. McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18, 1-86. McQueen, J. M., Cutler, A., Briscoe, T. & Norris, D. (1995). Models of continuous speech recognition and the contents of the vocabulary. Language and Cognitive Processes, 10, 309-331. Norris, D. (1990). A dynamic-net model of human speech recognition. In G.T.M. Altmann (Ed.), Cognitive Models of Speech Processing: Psycholinguistic and Computational Persepectives, 87-104. Cambridge: MIT. Norris, D. (1994). Shortlist: A connectionist model of continuous speech recognition. Cognition, 52, 189-234. OÕGrady, W., Dobrovolsky, M., & Aronoff, M. (1989). Contemporary Linguistics. New York: St. MartinÕs. Plaut, D. C., and Kello, C. T. (1999). The emergence of phonology from the interplay of speech comprehension and production: A distributed connectionist model. In B. MacWhinney (Ed.), The Emergence of Language (pp. 381-415). Mahwah, NJ: Erlbaum. Tanenhaus, M. K., Spivey-Knowlton, M., Eberhard, K., & Sedivy, J. C. (1995). Integration of visual and linguistic information is spoken-language comprehension. Science, 268, 1632-1634. Viviani, P. (1990). Eye movements in visual search: Cognitive, perceptual, and motor control aspects. In E. Kowler (Ed.), Eye Movements and Their Role in Visual and Cognitive Processes. Reviews of Oculomotor Research V4. Amsterdam: Elsevier.
http://www.ling.rochester.edu/wpls/s2000n1/magnuson.etal.pdf
University of Rochester Working Papers in the Language Sciences—Vol. Spring 2000, no. 1 Katherine M. Crosswhite and Joyce McDonough (eds.)
Connectionist Modeling for…er… linguists Bob McMurray (
[email protected]) Department of Brain and Cognitive Sciences University of Rochester, Meliora Hall, Rochester, NY 14627 USA
Abstract Connectionist modeling (AKA neural network modeling, connectionism) is rapidly becoming a dominant descriptive and theoretical tool for the psycholinguist. Below is a brief introduction to some of the terms and concepts used in connectionist modeling. Connectionist models are no different than any other sorts of theories in cognitive science, they merely offer a new computational toolbox, or set of algorithmic constraints on models and theories of cognitive phenomena. In this paper I review many of the important components of connectionist models and introduce some of strengths, pitfalls and caveats that casual readers and serious modelers must be aware of.
Introduction If you’ve read past the abstract, you must have resisted the urge some linguists feel to put down the article after reading the word “connectionist”. Thank you. I’d like to welcome you to our informal field by teaching you some of the lingo you’ll need to navigate. I’ll try to avoid using the math that modelers love to flaunt and instead focus on the underlying concepts and architectures. Hopefully, after reading this, you will be able to start reading modeling papers and understanding much of what is going on. Moreover you may stop falling asleep at modeling talks. If I’m lucky, you may even collaborate with psycholinguist to build your own models of linguistic phenomena. Hopefully you’ll have enough understanding of the basic terms and issues to do all of these things after reading this paper. Throughout this paper I’ve tried to put most connectionist terms in boldface so that you can find particular concepts quickly by scanning. Connectionism as a field grew out of work in neurobiology, computer science, electrical engineering, statistics, and cognitive psychology (and probably other fields), so there are often many terms that mean the same thing (depending on what your background is). In these cases, I have tried to provide all of the terms. I’ve also tried to include terms and concepts that are not formally defined anywhere, but have proven useful to connectionists discussing their work over the years.
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
73
WPLS:UR, vol S2000, no. 1
A common question that linguists have asked me is “what is a connectionist model?” The answer to that question is surprisingly quite simple. A connectionist model is really an algorithm for turning some input (which presumably maps onto something of psychological or linguistic interest) into some other output (which may map onto some data). In this regard it is very similar to any other cognitive or linguistic model that has been implemented computationally. Take, for example, an Optimality Theory Grammar. An OT grammar turns a collection of phonological forms from Gen (the input) into the actual production (the output). The only difference between this grammar and a neural network is that the kinds of computations we are allowed to use in creating the algorithm are different. OT prescribes one type of computation (constraint satisfaction), while connectionist models use computations that are very loosely based on the kinds of computations that neurons and populations of neurons might perform. Under this view, connectionism is simply a set of (mostly) agreed upon guidelines for what sorts of algorithms are appropriate for describing cognitive behavior.
Architecture All connectionist models are composed of two simple concepts: nodes (AKA neurons or units or cells) and weights (AKA connections or synapses). A node can be considered a highly idealized representation of a neuron. It has an activation (or firing rate) that tells us how strongly that neuron is firing. In a very simple case, a node might be assigned to a real world concept such as a specific phoneme, /b/. It’s neighboring nodes may represent other phonemes, /d/ and /t/. In this case, the activation of the /b/ node relative to the other nodes would tell us how strongly the system believes a /b/ was present in the input. Oftentimes the activation of a node will be simplified by saying the node is either on (firing) or off (not firing, inactive). Keep in mind that very few connectionist models have nodes with discrete activation levels—on or off simply refer to the node having a lot of activation (relative to the other nodes) or a little. Nodes are organized into layers (AKA arrays or vectors). Each layer is a cluster of nodes that are [usually] functionally related. For example, one layer of a network may consist of the group nodes that correspond to each phoneme; another layer may have nodes that correspond to words.
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
McMurray—Connectionism for… er… linguists
74
Lateral weights, synapses, connections Output layer, array, vector. Activations computed from hidden and output layers and their weights. Hidden node, neuron, unit Hidden layer, array, vector. Activations computed from input layer and its weights.
Bidirectopmal weights, synapses, connections
Feedforward weights, synapses, connections Input layer, array, vector activations set by modeler.
Input node, neuron, unit
Figure 1: A typical diagram of a neural network with its features labeled.
In any model, one or more layers is designated the input layer. Stimulus from the outside world is received into the network via this input layer. The stimuli consist of numerical representations of real world objects or stimuli. When the modeler sets the activations of the nodes of the input layer to match one of these representations, the network has received that stimulus as its input. The patterns of input activation may come from a corpus of text, a digitized waveform, or any other set of stimuli the modeler wishes. Additionally, they could be manually set to arbitrary values if the modeler wishes to abstract away from real input (possibly the real input is too complex to illustrate the problem the modeler wishes to work with). The set of input activation levels the modeler decides to use is called the training set. Each item in a training set minimally consists of the activations for each input node in the input layer. The training set will sometimes contain other information such as the expected value of the output nodes for each input. This will be discussed when we talk about learning. Each network will also have one or more output layers. The output layer is the cluster of nodes that will determine the network’s “behavior”. The values of these
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
75
WPLS:UR, vol S2000, no. 1
nodes are the values that we will attempt to relate to the empirical data that we are trying to evaluate. For example in a network designed to categorize phonemes, the input layer might represent a digitized waveform, and the output layer would have a node corresponding to each phoneme. The way in which the activation of nodes in the output layer is related to the empirical data or behavior is called the linking hypothesis (because it links models and data). For example, for our phoneme categorization example, our linking hypothesis might be that the model will choose the phoneme with the most activation as the phoneme it heard. I’ll talk more about linking hypotheses later. Layers of nodes that do not receive input or provide output are called hidden layers. These layers compute some sort of intermediate representation (between input and output layers). Many modelers dispense with the input, hidden, and output layer designations all together and simply refer to layers by what they designate. The TRACE model (McClelland and Elman, 1986), for example has a feature layer, a phoneme layer, and a word layer, but none of them is designated the output layer. TRACE, in fact, can use either phonemes or words as the output depending on the task at hand. In models like these, one must think about the logical flow of information is a psychological sense if you wish to determine the input and output layers. Many models are described simply as 2layer or 3-layer networks (or more). A 2-layer network will necessarily have only an input and output layer. A 3-layer network will have both of these plus one hidden layer. A 4-layer network will have two hidden layers. In the remainder of this paper, whenever I refer to simply input or output, I will be referring to the entire input or output layers (i.e. the pattern of activations of across node in the layer). Often times, a layer of nodes is thought of as a set of coordinates in a multidimensional space. This is easiest to visualize for a network of two nodes. The activation of the first node could be considered the X-coordinate. The activation of the second node would be the Y-coordinate. Then any particular pattern of activations across the two nodes can be thought of as a unique point in a 2-D coordinate system. So if the input activations for the two nodes were .5 and .8, we could talk about the input as the single point . Of course, when we move up to larger networks we won’t be able to visualize a 16 dimensional space. However, we can still talk about one, and this spatial metaphor is used frequently. Under this metaphor, the input space would consist of all regions of the possible N-dimensional space that are used in the network (where N=number of inputs). The output-space is the corresponding regions in
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
McMurray—Connectionism for… er… linguists
76
M-dimensional space (where M=number of output nodes). People often refer to the dimensionality of a space (which is simply the number of nodes). Then when information is passed from an input space of high dimensionality to an output space of lower dimensionality, the information is undergoing dimensionality reduction—it must be compressed (and some information invariably lost) in order to “fit” in the lower dimensionality space. This forces the network to make group some inputs together and discard others according to the correlations it finds in its inputs. They types of categorizations it makes may be of ultimate interest psychologically. This way of describing network behavior spatially provides a convenient way of describing a network. When activation patterns change, we can talk about the network moving to a new point in the input space. Moreover modelers often speak of learning (which I will discuss shortly) as a search through the output space. Finally, dimensionality reduction is often thought of as a form of information compression (as a network may have to represent 3-D information, for example, in only two dimensions). Dimensionality reduction is also a common concept used to describe statistical techniques such as factor analysis, clustering, and multidimensional scaling (if you don’t know these terms, that’s fine, I merely throw them out to show that the analogy can be helpful in relating neural network computations to other types of computational tools). In a network nodes are connected to each other by weights (AKA synapses, connections). Each weight represents the amount of activation that can be passed by one node to another. If an input node is highly active and it has a strong connection to an output node, that output node will also be highly active. If it has a weak connection that output node will not be highly active. We’ll go over the details of this in a moment. The set of all weights between two layers is termed the weight matrix (for reasons we’ll see shortly). When a model is built, the weight matrix often starts as a matrix of small random numbers (as we will discuss, it will be modified later by learning). Weights can either excite (make active) or inhibit (make inactive) the nodes they connect. Excitatory weights will cause a node to become more active if the nodes that connect to it are active. Inhibitory weights will cause a node to become less active if the nodes that connect to it are active. Weights that pass information from input to output nodes (or in that direction between hidden nodes) are considered feed-forward connections. Weights that
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
77
WPLS:UR, vol S2000, no. 1
pass information backwards o u tp u t in p u t from output nodes to input .3 2 nodes (or in that direction between hidden nodes) are 1 .1 = 2 * .3 + .5 * 1 1 .5 considered feedback connections. Bidirectional 1 weights pass information both 3 .0 5 = 2 * 1 .5 + .5 * .1 ways. Weights that connect .1 .5 units within a layer are considered lateral connections. F ig u re 2 : A b asic co n n e ctio n is t n etw o rk . C ircles The most common use of rep resen t n o d es, an d arro w s w e ig h ts . V alu es lateral connections is lateral in sid e c ircles re p resen t activ atio n s. inhibition in which nodes within a layer attempt to turn each other off. The result of this process is that a few nodes have all the activation and the others have none. Consider the example network in figure 2. This network consists of two input nodes and two output nodes (a 2x2 network), fully connected (each input nodes is connected to each output node) and feed-forward. The activations of the input nodes have been set to 2 and .5 by the modeler. To compute the values of the output nodes, we will use some function of the inputs and the weights. This function is called the activation function. outputtop = f( inputtop, inputbottom, weighttop->top, weightbottom->top)
(1)
The simplest activation function is the linear activation function. Each output node is simply the sum of the activation each input node multiplied by the corresponding connection (weight) to that output node. outputtop = inputtop*weighttop->top + inputbottom*weightbottom->top (2) outputbottom = inputtop * weighttop->bottom + inputbottom * weightbottom->bottom This can be generalized to: Num input
outputy =
Σ input * weight x
x->y
x=1
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
(3)
McMurray—Connectionism for… er… linguists
78
We can simplify this even further with some linear algebra. Let Output (with no index) become a vector of all the output activations, and Input (with no index) be a vector of all the input activations. Input = [Inputtop Inputbottom] = [2 .5] Output = [Outputtop Outputbottom]
(4)
Now let W be defined as a matrix where the row indicates the index of the input node (in this case, the top node would have an index or row of 1 and the bottom would have an index of two), and the column indicates the index of the output node. The value at each position indicates the connection strength or weight. weight1,2 ] weightb2,2]
W=
[weight1,1 [weight2,1
W=
[weighttop->top [weightbottom->top
W=
[ .3 1.5 ] [ 1 .1 ]
weighttop->bottom weightbottom->bottom
(5)
] ]
Then by the definition of matrix multiplication (which essentially says: for each output node, do equation 3, and concatenate all the results into a vector) we can simplify the whole thing into. Output = Input * W
(6)
where ‘*’ indicates matrix multiplication, and Output and Input are vectors, W is a matrix. As some one to explain the linear algebra to you, and you will see it’s not too complicated. You should recognize, thought, that equation 6 and equation 3 are doing the same thing, as you will often see it notated both ways. All of this stuff so far has been to describe the linear activation function. This activation function says that as you give input activation to an output node (as a function of the weights) the output activation will increase proportionally. This isn’t the only possible activation function, though. As equation 1 implies virtually any function could be used (although modelers tend to limit themselves to simple, understandable functions that may be neurologically plausible). The most common nonlinear activation function (i.e. not equation 3) you will see is the logistic activation function. Without going into the math much, the
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
79
WPLS:UR, vol S2000, no. 1
1
Output activation
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -10
-5
0
5
10
Input activation Figure 3: The logistic activation function. For any input activation to an output node, the logistic function outputs a value between 0 and 1.
logistic activation function serves to truncate the possible values of the output activation to a value between 0 and 1. If the sum of inputs*weights is high, the output node will equal 1. If that sum is low, the output node will have an activation of 0. Non-linear activation functions are crucial to the success of multiple-layer networks because it has been shown that for any network with more than two layers that uses a linear activation function, a two-layer network can be built that performs equivalently. Essentially, if you want to reap any advantage out of having more than two layers, you have to use a non-linear activation function. The logistic function is a particularly good one, since the logistic function is what is known as a basis function. A basis function is a function that can approximate any other function if you add enough of them together (the Gaussian curve, and the sine wave are other examples of basis functions). So, if you think of a bunch of hidden units with logistic activation functions a network could approximate many other functions by simply adding them together. Because of this, neural networks have been termed universal function approximators. Although often connectionist models have been associated with non-modular (or interactive) theories of processing, and tabula-rasa, statistically-oriented theories of learning, as universal function approximators, connectionist
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
McMurray—Connectionism for… er… linguists
Input
80
Output .7070
2.0 < 0.7 , 1.5 >
.6701
Input Output [2.0 0.8] * W = 1.95 [0.7 1.5] * W = 1.50
1.0 < 2.0 , 0.8 >
0
1.0
2.0
Figure 4: A neural network trained to perform a dimensionality reduction and the corresponding “geometric” representation of that reduction. In this case, the network is mapping the points in two dimensional space to a point in one dimensional space (along the dotted line). The values of the weights determine the equation of the line. Of course, not all networks perform a dimensionality reduction. Some keep the same dimensionality (merely shuffling the points in a predictable way), others will increase the dimensionality. The main point, though is that the weight matrix serves to perform this “remapping”.
models can instantiate any sort of theory and should not be pigeon-holed into these particular lines of thought. As I mentioned previously, layers of nodes are often thought of as coordinates in a multidimensional space. Under this view, the weight matrix then performs a remapping of a coordinate in N dimensional space to one in M dimensional space (where N is the number of input nodes, and M is the number of output nodes (see figure 4 for an example and explanation of this).
Representation It is often useful to classify a model (or sometimes just a layer of a model) according to how it represents real world information. A localist representation is one in which each node has a label of some kind, and when that node is active, it is in a sense saying “I think my label is correct.” An example of this is a layer of cells in which each node corresponds to a different phoneme, or one in which nodes correspond to various people. Often localist nodes are derogatorily called Grandmother Cells, after a famous thought experiment in which someone asked “What would happen if your grandmother cell was damaged? Would you be unable to recognize your own grandmother?”
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
81
WPLS:UR, vol S2000, no. 1
Although a trifle silly, this question does raise the important point that localist representations are very susceptible to damage. If your only cell that recognizes /b/ is damaged, your network can no longer recognize that sound any more. Many other people have pointed out, however, that a node doesn’t necessarily stand for one neuron, but that it could stand for a whole population of neurons. Under this view, localist networks could easily survive damage. A distributed representation represents information across several cells. Sometimes this is completely arbitrary. The word “boy” for example may be the pattern [1 1 0 1 0 0] (for a layer with five nodes), while “botch” is [1 1 1 0 1 1]. Other types of distributed representations may assign smaller units of meaning to individual nodes, although the interesting meanings are distributed across them. In our previous example, if the first cell responds to a “b”, the second to an “o” the third to a “t”, the fourth to a “y” and the fifth and sixth to a “c” and “h” respectively, our representation of each word is still distributed, but each unit is now meaningful as well. Distributed representations are particularly valuable in that they can withstand damage well (if you knock out a single node, there may be enough information remaining in the other nodes to maintain the representation). They also implicitly encode similarity. In the example above, “boy” and “botch” are similar in that they both share the first two letters. As such, their distributed representations share two active nodes. Because of this similarity encoding, distributed representations can often generalize patterns they have seen to novel ones. Another type of distributed representation is the topographic map (AKA population code). In this scheme, a layer of cells represents the value of some continuous value by location. For example, in a layer of 10 cells that respond to sound frequency, the left-most cells may respond highest to low frequencies and the right most to high frequency. You can then recover which frequency the cells heard by looking at which cells fired. Topographic maps do not always represent their inputs linearly—they may have a lot of cells devoted to low frequencies and only a few to high frequencies, for example (this is what the output of a Kohonen network, which we’ll read about shortly, looks like). On some level, the debate over representation is a little pointless. It has been said that “one level of representation’s localist representation is another level’s distributed representation”. The two representational schemes are not terribly different and really depend on the level of description you wish to use and the way in which you wish to describe your model’s behavior and architecture. This is not say that it is not important to have a good understanding of the way in which your model represents information, just the there are no hard lines between
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
McMurray—Connectionism for… er… linguists
82
distributed and localist representations, and one should not worry too much about the debate over them.
Learning As you may have noticed, most of the interesting computational work in a neural network is done by the weights. At this point you are probably asking yourselves: how do I get the weights? That will be the topic of the next section: learning. I intend to keep to the more abstract conceptual level, however, an excellent description of the math behind the various learning systems can be found in Rumelhart, and McClelland (1986) and McClelland, and Rumelhart (1986). A good comparison of work in developmental psychology with connectionist learning can be found in Elman, Bates, Johnson, Karmiloff-Smith, Parisi and Plunkett (1992) The connection strength associated with each weight is usually set by a learning process (although in some cases, such as TRACE, they can be set by hand by the modeler to implement a specific theory). Each network has a learning rule that essentially tells the network how to modify its weights at any given point. Learning rules change the weights as a function of the activations of the input and output units, the value of the weight itself and possibly some error signal— how close the actual output values are to the target output values (the ones you want the network to output). All learning rules have a component called the learning rate that determines how fast or slow the network can change its weights (essentially how much the network can change as a result of a single input). This gradual modification in weights leads to gradual change in the network’s performance. The challenge to the modeler is to use learning rules appropriate to the task the model is given so that this change is an improvement. The process of modifying the weights over time is learning (also training, or simply running a model). Regardless of the type of learning rule used, networks can be trained in two ways: batch learning, and online learning. In batch learning, the modeler presents the each item in the training set to the network and computes it’s corresponding output activations. The weights are not changed until after the network has seen all of the possible input/output pairs when they will be modified using a learning rule. This forces the learning rule to consider all the input the network will ever see before changing any weights. The network will probably process the entire batch multiple times (each time is usually called an epoch, though this term is often misused in the literature). Batch learning is often considered implausible (e.g. it seems clear that children do not wait until
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
83
WPLS:UR, vol S2000, no. 1
they have heard every English sentence before learning to talk), but has the advantage of preventing a network from getting sidetracked by a single weird input. The more common training scheme is online learning. In this scheme, the model cycles for multiple iterations (AKA generations and sometimes, confusingly, epochs). At each iteration, a single item from the training set is chosen (either randomly or by fiat), and the activation of each input node is set according to that item. Output activation is computed via the weight matrix, and the weights are modified via the learning rule. This is then repeated again and again until the modeler decides to stop. Usually the weights settle (stay at approximately the same value from iteration to iteration) after some time—this is a good place to stop training. In most models, the model starts its “life” with a random weight matrix (essentially, each weight is a randomly selected value, usually within a small range). This ensures that the model does not start its life with any preknowledge of what it is to learn. It also is essential for many of the learning algorithms because initially, each output node will be biased differently in response to an input (if the network started out with a weight matrix consisting all the same number, each output node would be equally biased towards everything and learning would be very difficult). So what kinds of things make up the learning rule? How does one know what to change the weights to? Modelers have been working on this issue for quite some time and have arrived at two broad categories of solutions: supervised learning and unsupervised learning. Supervised learning rules change the weights as a function of a teaching signal which is provided by the modeler to tell the network what it should be outputting in it’s output layer. This teaching signal is often considered part of the training set. For our dinky 2x2 network, the modeler might provide a training set such as the one below: If the network sees… [1 0] [0 1] [0 0] [1 1]
…it should output [1 0] [1 0] [0 1] [0 1]
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
McMurray—Connectionism for… er… linguists
84
Then at each iteration, the actual output can be compared with the target output (the output provided by the teaching signal) and each weight can be adjusted according to whether it was contributing to the correct output or not. This comparison is usually in the form of an error signal, the difference between the target and actual output. The delta rule (AKA the LMS rule) and back-propagation are two commonly used forms of supervised learning. The delta rule works very similarly to what I’ve described above. However, the delta-rule does not work very well for multiple-layer networks (unless you have target values for the activation of the hidden units). Back propagation is designed to send the error signal back through the hidden units (by transforming it via the weight matrix and a lot of messy calculus). Thus back propagation can be used with networks of any size. Since a complete description of this requires calculus, I will wave my hands a bit and move on to the next section. However, I direct the interested reader to Rumelhart, Hinton and Williams (1986). It is important to note, that back propagation is not widely considered to be neurologically plausible as a neurological mechanism for passing error information back through multiple synapses has not been found, and the, as I’ll discuss later, the source of the error signal itself can lead to biological implausibility. When doing supervised learning, modelers often want to talk about how close their model is to the target output. The most common way to do that is to compute the Mean Square Error (MSE). This is very simply defined. For any given input, compute the squared difference between each output node’s activation and its target activation (by squaring this difference, we make each difference positive, so every node’s error adds to the total error). Now take the mean of these numbers. That is the MSE. Since this only tells you how good the model is doing on a single input pattern, many modelers will compute the MSE for each member of the whole training set to see how the model is doing. MSE is also a nice way to determine how long to train the model—simply present inputs to the model and run your learning rule until MSE is below some arbitrary cutoff point. Computing a single value for the performance of a network prompts many modelers to speak of the error-space or the weight space. Consider a network with only two weights. If we look at all the possible values for these weights and compute the MSE for each combination, we could plot a three dimensional errorlandscape where the X axis was the first weight, the Z axis, the second weight, and the Y axis (vertical) the MSE. Supervised learning algorithms then simply search this error-space for the point (combination of weights) with the lowest
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
85
WPLS:UR, vol S2000, no. 1
MSE. They start from a random point (remember our weights are set to random values initially) and wander until they can no longer reach a lower point. In doing this search, a model may fall into what is called a local minimum. A local minimum is simply a point in this error space that is lower than all of its neighbors, but may not be the absolute lowest point. Training the same model from several different starting points (random weight matrices) is a good way to escape this potential pitfall, as you are more likely to be sure that the final state is an absolute minimum. A classic back-propagation model is the autoassociator (AKA the autoassociative network). This network is a three layer-network with the same number of input and output nodes and a smaller number of hidden nodes (thus the network is performing a dimensionality reduction as activation flows from input to hidden nodes). The network is trained to repeat whatever input it is given. This may seem trivial, but this is in fact an interesting problem given the dimensionality reduction. For example, an autoassociator may represent a time-slice of a spectrogram by 100 nodes, but only have 4 hidden nodes through which to send that input to the output nodes. After computing hidden unit activations, it will need to recover 96 dimensions to go from hidden to outputs. In order to do this, of course, the learning rule must pick 4 dimensions to represent the input that are particularly important (account for a lot of the variance in the input). If this model is able to learn to perform its task, it may be very interesting what sorts of hidden unit representations it learns. In this particular task, we might expect the hidden units to approximate acoustic features. Autoassociators bring up two very important concepts concerning backpropagation networks. 1) If you want to use your model to evaluate learning (ignoring for the moment issues about whether propagating the error signal is neurologically plausible), you must evaluate the plausibility of the teaching signal. It may be obvious that the teaching signal is doing a lot of the work in back-propagation networks. Since you could train a network to do virtually anything, given a good teaching signal, it is important to evaluate whether or not the signal you use is psychologically and/or biologically plausible. A word recognition network that is trained on acoustic input and told what the word is for each sound pattern is not very plausible, as real human babies don’t generally have access to this. An autoassociator, however, does have a
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
McMurray—Connectionism for… er… linguists
86
plausible teaching signal, since brains probably do have access to their inputs. However, if you are not interested in learning itself, but rather, on whether or not a set of inputs are learnable, the plausibility of the teaching signal is not as much of an issue. 2) Hidden unit representations are important. In a lot of cases, (such as the autoassociator, they are the only interesting results. It is crucially important to evaluate what your hidden units are paying attention to in the input. This, of course, can often be difficult or even impossible, particularly in cases where the hidden units seem to represent inputs in arbitrary distributed representations. Often, however, individual hidden units will have some meaning that may be interesting. Evaluating what sorts of inputs the hidden units respond to can be very difficult. The best way is to treat the hidden units as a psychological experiment. Present them with various inputs that you have varied systematically to test one or more hypothesis. Then try to find out if the activation of certain hidden units (or groups of units) can be predicted by those hypotheses. Unlike supervised learning, unsupervised learning requires no target values for the output—there is no right or wrong answer. Rather, weights are modified as a function of the input and output activations only. One of the most common unsupervised learning rules is the Hebb rule, proposed by Donald Hebb in the late 1940s. Hebb (1948) actually proposed this rule long before we knew anything about neural networks (computational or biological) and it turns out to have been very useful in the computational literature and also has a close physiological correlate in a phenomena called Long Term Potentiation or LTP (that is to say that real neurons actually behave this way). Although some people use unsupervised learning and Hebbian learning synonymously, the strict definition of Hebbian learning states that if an input node and an output node are simultaneously active, the strength of their connection increases. For example: Wxy = Wxy + Ix*Oy
(7)
Here, if either I or O are equal to zero, there will be no change in weights. If they are both active, however, W will be increased. Since we can’t have weights increasing indefinitely, however, many modelers will include a weight decay term that says that if the nodes are not simultaneously active to decrease the weights. Of course we will also want to include a learning rate (which we will abbreviate as ε)
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
87
WPLS:UR, vol S2000, no. 1
Wxy = Wxy + ε(Ix*Oy - Wxy)
(8)
Here if I and O are active, we will increase W by a small amount (the old value of W multiplied by the learning rate). If they are not we will decrease it by a small amount. Less common than Hebbian Learning is AntiHebbian Learning in which if an input and output node are simultaneously active, their connection decreases. Of course, there are many unnamed variants of these two supervised learning rules, but they are similar in that they do not depend on a teaching signal. One common scheme for using unsupervised learning is competitive learning (or winner-take-all learning, see Rumelhart and Zipser (1986)). In this scheme before computing the weight change, the modeler sets the output node with the highest activation to one and all the others to zero. This is a simplification of a lateral inhibition process. Then the weights are changed according to a Hebbian or other unsupervised rule. The result of this sort of learning is that the model is able to find categories in the input (i.e. it will devote one output node to one category of inputs in the training set and a different output node to the others). Another common scheme is the Kohonen (1982) network (or Self Organizing Feature Map, SOFM). A Kohonen network works very similarly to a competitive learning network, except that rather than exciting only the winner in the output layer, the winner and a number of it’s neighbors are excited together, before applying the learning rule. The result of this is a distorted map of the input space in the output space in which regions of the input space that occur frequently in the training set have lots of output nodes devoted to them and other regions have fewer. Hebbian learning has also been used in Pattern Completion Networks (famous examples are the Brain-State-in-a-Box and the Hopfield Network). These networks have only a single layer that serves as both the input and output layers. All of the nodes in this layer are connected to each other (laterally) and these weights are modified with Hebbian learning. The model is trained on a series of patterns until the weights settle. Then afterwards, the model can be given a partially complete pattern and will be able fill in the rest. For example, a fournode pattern completion network may be trained on the following activation patterns [1 0 1 0]
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
McMurray—Connectionism for… er… linguists
88
[0 1 0 1] With training, it will learn that when node #1 is on, node #3 should also be on, and that when node #2 is on, node #4 should also be on. So when presented with [1 0 _ 0], it will output the correct pattern, [1 0 1 0].
Noise When building connectionist models, we can make them pure and pristine, perfect examples of what cognition should be. However, this is rarely a useful generalization since everything we know about the brain suggests it is as noisy as a debate on Chomskian language acquisition. To counter that objection, people often add noise to a system. This may seem abstract and weird but all they are doing is adding small random numbers to something. Sometimes this is added to the input layer before outputs are computed, sometimes it is added when the outputs are computed, and sometimes it is added to the weights. Using a weight matrix of small random numbers is another extremely common method of adding noise (although this is usually considered adding noise to the learning mechanism, without affecting the processing). Just know that adding noise is simply injecting a little randomness into the model somewhere. Noise doesn’t always degrade performance. Elman and Zipser (1988), for example, found that if they added noise to a speech recognition network it actually learned better, because the noise forced the network to create “noiseindependent” representations of the speech. These representations were more useful in generalizing across speakers and contexts. Another key point regarding noise is that once you add some to a network, your model is no longer deterministic. That is to say that every network is going to be slightly different (because you will be adding different random numbers to each instantiation). Because of this, you are not guaranteed that every network will be able to solve the problem, so it is a very good idea to run several different models under different noise conditions to determine how your model fares against noise. Conversely, when you read a paper in which noise is added to a model (even if it is just in the initial weight matrix), it is important to note whether the author ran the model several times. Otherwise, the possibility is open that he or she simply got lucky the first time (or didn’t report the 200 models that failed). A network that generally solves the problem every time in differing levels of noise is said to be robust against noise.
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
89
WPLS:UR, vol S2000, no. 1
Recurrence Cognition often must unfold over time. In order for networks to capture this, recurrence is often added. Recurrence generally means that a layer’s activation is in some way influenced by that layer’s activation at a previous time. Some recurrent networks will have layers (such as an output layer) that are a function of themselves (at previous times). For example: Outputtime=t = f(outputtime=t-1,inputs…)
(9)
In the simplest case of this, the network may consist of only a single layer (which is both input and output) and simply connects to itself over time. The Pattern Completion Network discussed earlier is one such example. Recurrent networks usually take time to process a single input (as activation flows back and forth between nodes). Often, giving a recurrent network an input and allowing it to process it is called running the network (although this can often refer to training as well). Other networks may have layers with more indirect influences on themselves. The TRACE model (McClelland and Elman, 1986), for example, is a type of recurrent network known as an interactive activation model (or IAM). In this model, activation starts at the feature level and is passed to the phoneme level and then to the word level. The word level then passes activation back down to the phoneme level (via feedback) connections, so that the phoneme activation at time 2 is a function of both the feature input and information from the word level (which of course is determined by the phoneme level at time 1). This process cycles over and over again through time and predicts a number of the results about the temporal dynamics of speech perception. Another famous recurrent network is Elman’s (1990) simple recurrent network (or SRN). These networks have been used to model all sorts of sequential behavior (of which language is probably the most interesting). They use backpropagation for learning and are trained to predict the next input they will receive. For example, if they are learning sequences of words such as “the dog smiles”, and “the boy eats” at any one instance of “the” the SRN will be trained on the very next word (such as “dog”). Over time, this SRN should report that “boy” and “dog” are highly likely (active) after hearing “the”, but “eats” and “smiles” are not. Simple Recurrent Networks have a very simple structure that has turned out to be quite powerful. Activation starts in the input layer and flows into the hidden
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
McMurray—Connectionism for… er… linguists
90
Output Layer: trained to predict the next input. Learnable weights
Context Layer=Previous Hidden Layer (at time t-1)
Hidden Activation copied
Hidden Layer = Input*Win + Context *Wcontext
Learnable weights
Learnable weights
Input Layer Figure 5: A simple recurrent network. The network is trained to predict the next output. At each iteration, activation in the hidden layer is computed from the input and context layers. That activation is then copied to the context layer for the next iteration.
layer. Activation in the hidden units is not simply computed from the input layer alone, rather it is equal to the input layer multiplied by its weights plus the activation of the old hidden units (at the last time-step) multiplied by some other weights. Output activation is computed from these hidden units. Thus, when dealing with temporal stimuli (such as language), the SRN you will need to be basing outputs on not only the current input (for a word, for example, the current input might be a phoneme), but also on some of the previous inputs (the previous phonemes). Although SRN’s are trained using ordinary back propagation, many other recurrent networks are trained using a learning algorithm called back propagation through time. In this algorithm, the network is literally unfolded over time so that the output layer at time 1 will be one physical layer of the network and the output layer at time 2 will be treated as an independent second layer of the network (see figure 6). The network can then trained as a regular old multi-layer network and the changes to all the weights (remember since each layer
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
91
WPLS:UR, vol S2000, no. 1
a)
b) Layer X at time 3
Layer X
Figure 6: Back propagation through time. A) A single layer, fully recurrent network. B) That same network, unfolded through time before learning. Each of the three layers represents the same layer as in (a) at a different time. Each set of weights is the same recurrent set in (a) as well. Learning occurs as if this network (in b) was an ordinary three- layer network. After learning the modifications in the upper set of weights are combined with the modifications in the lower set of weights to change the final weight matrix.
Layer X at time 2
Layer X at time 1
of nodes really consists of the same nodes, each weight matrix is really the same weight matrix) will be combined to compute the final weight changes.
Genetic Algorithms One emerging technique in connectionist modeling is the application of genetic algorithms to modeling. These algorithms seek to “breed” networks by using a technique reminiscent of biological evolution. Essentially, each network is assigned a genome that records its properties (such as the number of hidden units, the learning-rate, or the values of the weights). The most common scheme for encoding this genome is to use a string of bits (one or zeros). Each group of bits or gene (maybe the first 10, for example) will encode (in base-2) the value of whatever parameter that gene represents. Once the form of the genome is determined, a large number of networks will be generated by creating genomes at random. These networks are all trained, and after they have all been run their fitness function is evaluated. This function essentially tells the algorithm how good the network did at accomplishing its task. The next generation of networks is then created by combining the genomes of the networks with highest fitness values. Sometimes mutations are allowed to creep in by randomly changing one or more bits of the genome. There are literally thousands of different mechanisms for evaluating fitness, organizing the genome,
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
McMurray—Connectionism for… er… linguists
92
computing the genomes of the next generations, and having mutation. I direct the reader to Mitchell (1999) for a good introduction to them. There is nothing mathematically special about genetic algorithms. They simply form another class of search tools for fitting a model to a data. Other classes include learning rules like Back Propagation or statistical optimization techniques like Maximum Likelihood Estimation. The reader should bear in mind that among these optimization tools, genetic algorithms are the most poorly understood, and may not be the most efficient (they will take longer to solve the problem than other techniques). Genetic algorithms are popular mostly because of the compelling (to some people) biological analogy they provide. However, a close look at this analogy suggests they may not be as compelling as many people think. Researchers have used genetic algorithms to set the weights of a network as well as to determine features of the architecture (number of nodes, connectivity, learning rule, etc..). However, if you accept the majority-view that weights encode learned knowledge, it is hard to accept the evolutionary analogy for genetically determined weights as we have yet to find evidence for inherited knowledge. Moreover when genetic algorithms are used to determine the architecture of a model it is often extremely difficult to understand how a model is solving a particular task and how the genetic algorithm arrived at that solution. Because of this, such models are not good instantiations of a theory—since the theorist did not determine how the model processes information, “evolution” did—unless your theory is a theory about evolution (and then you run into the problem that the model of evolution in most genetic models is quite bare). I am not trying to say here that Genetic algorithms are useless. They do have their place in connectionism, but we must exercise caution in building them (and reading about them) to be sure that we are saying something interesting, interpretable and new about cognition. To really achieve any utility we must constrain the algorithms to the point where we can understand the output.
Damage and Lesions A growing body of literature has begun to examine what happens when a network is damaged. This has been particularly fruitful in language research as it is often useful to compare the output lesioned network with that of an aphasic. Much like the use of noise in connectionist networks, this lesioning a network is a concept that is much less complicated that it might seem.
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
93
WPLS:UR, vol S2000, no. 1
Researchers have come up with two major ways of damaging a network. The first is simply to remove some connections (weights) between nodes by setting them to zero permanently. The second is to remove one or more nodes (typically hidden units). In both cases, people have looked at damaged networks in two ways. Often they will simply compare their performance after the damage with real data from patients. Other times, after receiving the damage, the network will undergo some more training as a simulation of recovery. This is particularly interesting in the case where hidden units are lost (in an autoassociator, for example) as this asks the question of whether the network can successfully adapt to having fewer dimensions with which to represent its inputs. As I mentioned previously, the way in which damage is dealt with is one way in which localist and distributed representation schemes differ since distributed representations can deal with it more gracefully. Most networks exploring the effects of damage use distributed representations for this reason.
Discussion I’ll prewarn the reader that as I attempt to sum-up this article, my discussion is likely to turn into a personal pulpit for how connectionism should be done right. Other authors disagree with me of course, as many of these issues are either under active debate and those that aren’t have simply not yet surfaced as dominant issues in the literature (although I predict that they may soon). Connectionism has rapidly become a dominant tool for expressing and quantitatively modeling theories about psychological and neurological phenomena. Its use is growing in linguistics and it is our hope (on the psychological side of the fence) that more linguists will begin to add it to their theory building toolboxes. It has been shown that given enough hidden units and enough layers of hidden units, back-propagation networks can learn to solve any problem (whether or not they can help my love life is a different story…). As a result of this, when evaluating network models we need to determine a lot more than whether or not the model does the task, but also things like 1) Is the structure of the model neurologically plausible? Does the model perform computations that real neurons could not possibly do? 2) Are the posited input and output representations psychologically and neurologically plausible? A model that builds syntactic trees and is given parts of speech may not be all that interesting (unless we have a
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
McMurray—Connectionism for… er… linguists
94
good model of part-of-speech tagging), since it is unlikely the syntactic processor is simply given these… 3) What feature of the model allows it to solve the problem? How does it solve it? 4) Does the time-course over learning and/or processing match the same time-course in humans? 5) And most importantly, what is the linking hypothesis between the model and the data? Models do not output eye-movements, or buttonpresses or EEG waves or grammaticality judgments or reaction times. Whenever we relate model output to actual data, we must form some linking hypothesis as to how this relationship holds. It is crucial that this be made explicit and that it be well reasoned. Additionally, this linking hypothesis is just as important a part of theory building as the model itself: the same model with different linking hypotheses can often yield strikingly different results. When building a model, one needs to keep similar issues in mind. Although there is a large engineering literature that focuses on building models with the single goal of solving a particular problem, for the most part, connectionist networks in psycholinguistics and linguistics are built to instantiate a theory of language processing or learning (or some other aspect of language). In these models, there are a number of decisions to be made, and the best modelers will make these decisions on the basis of the theory they are trying to instantiate. 1) Localist or distributed representation? If a goal is neurological plausibility, distributed representations may be preferred (as grandmother cells have not yet been found in the brain) however a topographic map may be even better. If the goal is to relate output to discrete experimental responses, then maybe a localist representation will make it easier to do that. 2) What is the goal of learning? If you wish to model the time course of development or acquisition, maybe a more neurologically plausible unsupervised scheme is best. However, if you merely wish to show that a particular categorization or mapping is learnable from the input, a supervised learning rule may suffice. This distinction is not very clear-cut in the literature (many developmental arguments have been made with back-propagation), but it is important to keep in mind when building the model. If you do use a supervised learning rule¸ what is the basis of the teaching signal? Could it arise in real life with real brains/minds? Maybe you aren’t interested in learning at all, but rather, are more interested in exploring processing mechanisms. Here
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
95
WPLS:UR, vol S2000, no. 1
you may even consider setting the weights manually, or with a genetic algorithm. 3) Are you striving for a completely neurologically plausible architecture or is an abstraction enough? The answer to this can often constrain all the architectural choices you might need to make. Because of the power inherent in connectionist networks and because they are often as opaque as the cognitive system they are attempting to model, several cautions must be exercised. Models must be developed to implement specific theories, and a specific linking hypothesis must be formed linking the model with the data. The architecture of the model should be grounded in good linguistic and psychological theory and should be tied to the theory we wish to instantiate. We should make every attempt to understand how a network solves the task, not just that it solves it, constraining our architectures toward this end if that is necessary. Finally we should systematically explore the models we develop in a style similar to that of good psychological experimentation. We should always compare multiple instantiations of the same model. The effect of different sources and levels of noise should be systematically explored. Modelers should test the architecture of the model by looking at the effects of individual components of the network (e.g. running a network both with and without lateral inhibition). Lastly, models should be developed so that they can be directly compared to other models of the same phenomena. In the long run, only by combining these cautions with knowledge of the neuroscience, mathematics and psychology behind connectionist modeling will it ultimately prove useful as a tool for conceptual understanding and theory testing. References Elman, J., Bates, E., Johnson, M., Karmiloff-Smith, A., Parisi, D., and Plunkett, K. (1996) Rethinking Innateness: a Connectionist Perspective on Development. Cambridge, Mass.: The MIT Press. Elman, (1990) Finding structure in time, Cognitive Science, 14, 179-211. Elman, J. and Zipser, D. (1988) Learning the hidden structure of speech. The Journal of the Acoustical Society of America, 83(4), 1615-1626. Hebb, D. (1948) The Organization of Behavior. New York: Wiley Kohonen, (1982) Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43, 59-69 McClelland, J., and Elman, J., (1986) The TRACE model of speech perception. Cognitive Psychology, 18, 1-86. McClelland J., Rumelhart, D., eds. (1986) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 2, Cambridge, MA: the MIT Press.
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
McMurray—Connectionism for… er… linguists
96
Rumelhart, D., Hinton, G., and Williams R. (1986) Learning internal representations by error propagation. in Rumelhart, D., McClelland, J. (eds.) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1. Cambridge, MA: The MIT Press. 151-193 Rumelhart, D., McClelland J., eds. (1986) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1, Cambridge, MA: the MIT Press. Rumelhart, D., and Zipser, D. (1986) Feature discovery by competitive learning. in Rumelhart, D., McClelland, J. (eds.) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1. Cambridge, MA: The MIT Press. 151-193
Acknowledgements This paper grew out of a handout for a series of discussions on the relationship between linguistics, psychology and connectionist models in speech perception that included Mike Tanenhaus, Mikhail Masharov, Jim Magnuson, Katherine Crosswhite and Joyce McDonough. As a group they’ve contributed to this paper by introducing me to linguistic thinking, and forcing me to think about connectionism from this viewpoint. I’d like to thank Katherine in particular for helping to make this readable to linguists and Robbie Jacobs for helping to make this readable to connectionists by keeping my facts factual.
http://www.ling.rochester.edu/wpls/s2000n1/mcmurray.pdf
University of Rochester Working Papers in the Language Sciences, Vol. Spring 2000, no. 1 Katherine M. Crosswhite and Joyce McDonough (eds.)
"Mismatches" of Form and Interpretation Greg Carlson (
[email protected]) Department of Linguistics University of Rochester, Lattimore Hall, Rochester NY 14627 USA Note: This is the text of a talk given at the "Semantics meets acquisition" workshop at the Max Planck Institute in Nijmegen, March 31-April 2, 2000. The theme of this conference, expressed as "semantics meets acquisition," has an amusing ring to it as it hearkens back to B movie titles such as "Wolfman meets Frankenstein." This latent reference to movie monsters turns out to be apt, in a certain sense, as when I think about the twin issues of acquisition and semantics, and how to put them together, it does seem a monstrously hard problem indeed. Were this presented to me as an abstract problem in a form that I didn't recognize as really about learning and meaning, I'm sure I would throw up my hands and soon declare the problem insoluble. But this of course would be a misjudgment, as it is contradicted by the simple daily facts of the world. In this talk I wish to take a fairly superficial, perhaps even ignorant or naive, perspective on matters of meaning and learning. I am going to assume that language learners have, at best, access to knowledge of surfacy kinds of linguistic information, and some knowledge of context, and present in overview style some of the challenges learners might face in trying to construct a consistent form-tomeaning mapping. One way to begin thinking about the issue is "from the top", so to speak. The experience of extracting information from natural language utterances is a global oneÑthe experience is that of understanding something you didn't before the utterance event occurred, and that's about it. This does not distinguish for instance among presuppositions, conveyed meanings, implicatures, literal, or metaphorical meanings, nor any other information derived from the utterance, e.g. location, gender, emotional state of the speaker, etc, etc.. Some take this intuition about the unity of our experience at face valueÑI regard this to be the underpinnings of "holism of meaning", but I and many others believe that messages extracted from natural language are susceptible to analysis, and upon analysis it becomes clear that meaning in toto is composed of a variety of distinguishable factors. Let me draw a parallel: upon hearing a single word, say, the English word "cats" one has the experience of hearing a noise and pairing it with a certain type of animal, very roughly. And that's about it. There is nothing, I believe, in this experience that comes identified as also experiencing "a word", "two morphemes", "a stem", a feature [-sonorant], "Noun", and so on and so forth. Yet, upon analysis it becomes clear that this experience is somehow informed by a constellation of such factors, that all these factors or factors like them contribute their part to the whole. I take it that the experience of meaning is likewise amenable to such analysis, and when one considers the factors it becomes
http://www.ling.rochester.edu/wpls/s2000n1/carlson.pdf
CarlsonÑMismatches of Form and Interpretation
98
clear that "meaning experienced" in its broadest sense results from a combination of similar factors, factors that do not wear their rank on their sleeves but which become apparent upon consideration through the lens of theory. When we talk about "semantics", we intend a certain component of meaning, that component which is in some sense referentially based and which is connected most intimately with the syntax of natural language: I'm going to code this as "the truthconditional" aspect of meaning, a phrase I use here for convenience rather than in its fullest theoretical sense. This is the aspect of meaning which, I believe, is absent from otherwise meaningful objects and events, such as the dark colors in a painting, the rattling sound in my car, music, and, apparently (though I want to be a bit careful here), animal communication systems. However, language clearly conveys meaning in ways in common with such things, as well. Consider a point emphasized in Grice's work on conversational implicatures. He takes pains to point out that these implicatures apply to actions in general, not just the linguistic actions of executing utterances. So, for instance, one can congratulate someone by patting them on the back or shaking their hand, or one can do it linguistically by saying something like "Way to go there, Bob" or by using the stodgy performative utterance "I hereby congratulate you on your success." Meanings of actions then, including linguistic actions, contribute one component to the meaning of the whole. Another type of meaning that is not commonly discussed in truthconditional approaches is that of connotative meaning, associated with words. To learn a language is to learn, in part, facts like "butt" is a cruder way of making reference to certain body parts than "hind end," and that "derriere" is almost affectedly silly, in most contexts, despite common reference. Such social/emotional meaning is omnipresent in language, and seems most highlighted in poetry, song lyrics, and corporate presentations, but is a type of meaning clearly present in nonlinguistic artistic objects and events as well. Background cultural knowledge also informs meaning. For instance it is not a good idea in English to wish someone a refreshing night's sleep by saying "Rest in peace" as this is a formulaic phrase that used to appear routinely on gravestones. Or, in Norwegian one should not literally thank someone for everything (as one can in English), as the literal translation is a phrase found commonly in obituaries. My purpose here is not to enumerate or catalog the variety of meanings that the use of natural language gives rise to. Rather it is to make the point that when we begin to talk about the semantics of a quantifier or the scope of tense marking, and how they might be acquired, we are already a long ways from the starting gate in considering the general issue of meaning and language. Meaning comes at usÑand people learning a languageÑfrom a variety of different directions, at a large numbers of levels, and only one among them is the subject of the kinds of semantic theories I and many others are used to working with. And, apparently, it is a component of overall meaning learners must identify. Even restricting consideration to this semantic aspect of meaning the difficulty of the problem of learning hardly abates. Obviously, perhaps most obviously, one must learn the meanings of the words of the language (or, a significant subset of them, at any rate), and there are many terrifically interesting learning issues that have been explored within this domain, at least in the area of learning meanings of
http://www.ling.rochester.edu/wpls/s2000n1/carlson.pdf
99
WPLS:UR, vol. S2000, no. 1
the content words, noun, verbs, adjectives, in the main. One absolutely immediate problem that comes up here is that of ambiguity. I would like to point out that the problem, even at the lexical level, is of Godzilla proportions: by one count the 500 most common words of English have among them nearly 9,000 different meanings listed in the OED, or about 17 meanings per entry, on average. Granted, a good many of them are low-frequency or even archaic usages that are learned later in life, if at all, and in context none are remotely that ambiguous. But counterbalancing this somewhat is the fact many ambiguities are not included systematically in this count. Type/token ambiguities are systematically associated with nouns (Hence the ambiguity of "All the machines at the arcade are for sale", whether it is those actual machines, or other individual machines of the same design). Metonymic reference is not reflected there, as the practice commonly cited of waitpersons referring to customers by their orders, resulting in ambiguities in sentences like "The ham sandwich is ready to eat." Many ambiguities of thematic role assignment, which are astonishingly common, are missing. "John shoveled the cement" has the cement either as location cleared, or what was moved, "Sally packed the suitcases" can have the suitcases being put into things, or things being put into them, etc. I don't see how any of this helps a language learner. To pair a new meaning with a word for which you already have a meaning, it appears one must notice there is an error, in the first place, which requires tremendous sensitivity to context and what is appropriate in a context; in the second place one must also localize the error: if one hears "That actor is a ham" and notes that the utterance is wrong in context for the "smoked meat" sense of "ham," why not conclude the error is due to "actor," "that," "is," or "a"? Or what? But understanding an utterance of a sentence or discourse involves much more than just understanding the meanings of lexical items, and resolving ambiguities within them in context. It involves consideration of the ways the lexical items are combined with one another, and here as is well-known the problem of ambiguity hardly goes away. Also, as we know, the linear order of words can make an essential difference in meaning (ÒDogs chase catsÓ vs. ÒCats chase dogsÓ) but in many other instances there is no difference. (Cf. ÒScramblingÓ structures that appear in most languages, such as ÒWe have food enough for everyoneÓ vs. ÒWe have enough food for everyone.Ó) But perhaps most interestingly to me, a vital part of this combinatory semantics is ferreting out the contributions of all those "little" words to the meaning of the whole. Considering the contributions of these not so apparently referential things is a central focus of semanticists: what is the meaning of a tense marking, a modal, an indefinite article, a reciprocal expression, "if," "how," "which," what does an infinitive marker do, a plural ending, negation, pronouns? Consideration of these functional elements of meaning introduces issues concerning the mapping between forms and meaning that are either absent or obscured when one concentrates primarily on the semantics of lexical items or grosser aspects of sentence meaning such as argument structure. Consider one of my favorite examples the Classical Latin conjunctive particle "-que". Latin had this alongside the conjunction 'et', but the syntax of the two was not the same.
http://www.ling.rochester.edu/wpls/s2000n1/carlson.pdf
CarlsonÑMismatches of Form and Interpretation
100
"Et" appeared, from a semantic point of view, right where it is supposed toÑbetween the elements conjoined, like most conjunctions we're used to seeing. The enclitic "-que", on the other hand, appeared attached to the end of the first word of the phrase conjoined. Thus in (1) "-que" appears after the first word but signals that the whole phrase is a conjoined element, and not just the word "two": 1. ...
duasque ibi legiones conscribit "...and there he enrolled two legions"
In a slight wrinkle probably driven by prosodic considerations, it appeared attached to the second word if the first was a monosyllabic preposition: 2.
ob easque res "...and because of these things"
If one treats -que as having the meaning of a conjunction, and compositionally combines it with whatever it is combining with on the surface syntax, one would not be able to get these meanings. Instead, one must in some sense raise it up to a higher position in the tree structure, and put it in its rightful place. (This is a lot like QR, of course, with the notable difference that in the case of -que one does not wish to leave a variable behind.) Of course in such examples -que is not in any wrong placeÑto put it elsewhere would be wrongÑbecause the grammar says it's to be put where it is. But from a compositional semantic point of view, one needs to do some rearranging that one does not have to do with "et," "and," "und," etc. I'm not raising this as a curiosity, a funny little fact to note and tuck away. The position of -que is of course a Wackernagel position phenomenon, one so common it has a name like that, and it is possible to produce many more similar examples. But the semantic phenomenon of having to "rearrange" elements extends well beyond Wackernagel position particles. Consider how common it is to treat tense, for instance, as both syntactically and semantically a higher-level operator, and for good reason. A very common type of example from English VP deletion will illustrate this pointÑthe deleted VP in (3) does not carry the tense information of the antecedent VP, even though tense is expressed as an inflection on the verb: 3. John wrote a paper because he had to (*wrote a paper). Or, it appears plurality must be dissociated from the noun it appears attached to, by similar evidence: 4. John has two dogs and Fred has one (*dogs). It would be quite easy to extend this listing to include a lot of other inflectional categories, as is commonly done in the work on semantics and in syntax both. But let me move on, noting that it is probably extraordinarily common to have functional, including inflectional, elements not, in some sense, in their proper
http://www.ling.rochester.edu/wpls/s2000n1/carlson.pdf
101
WPLS:UR, vol. S2000, no. 1
place. Is this something we're born knowing already? That would help, it seems, but how can one tell? Coalescence phenomena between adjacent functional elements is extraordinarily commonÑit is the classic definition as to what is meant by an "inflectional language" as opposed to an agglutinative one. Coalescence may also occur with otherwise free morphemes, as with the preposition/article coalescence found in Germanic and Romance; thus, French "du" is in some sense the equivalent of "de+le". From the commonsense point of view taken here, this probably doesn't present any special difficulties, but there is a similar process that well could, a variant of haplology in which a sequence of two formally identical elements is reduced to one. This does not, to my knowledge, occur with lexical items (thus, "a bare bear" does not reduce to "a bear", meaning, a BARE bear). Consider the case of Japanese -no, noted by Kuno, Radford, and others. It has two quite distinct functions, as a possessive postposition and as a pronoun (meaning something like "one"). If these are juxtaposed, as in (5a), you get an ungrammatical sentence. But there is a nonperiphrastic way of expressing this, namely, (5b), with only one instance of -no. But both the possessive meaning and the pronominal meaning remain: 5.
a. *Kore wa anata no no desu ka This TOP you POSS one be ? "Is this yours?" (lit: "Your one) b. (OK) Kore wa anata no desu ka
Again, this is hardly a funny little isolated fact. One can multiply examples by the dozens in familiar and unfamiliar languages alike, and, as usual, when one looks for something like this, it seems to be all over the place. The Swahili negative past 'ku' occurring right next to the infinitival marker 'ku' reduces to a single ku- prefix, yet both meanings remain. In certain Turkish word forms two plurals "ought" to appear in a row, but only one appears, but there are two plurals, semantically (in the case of NP's like "their books" where both the possessors and things possessed are plural). The special problem examples like these raise is that, from a surfacy point of view, you have one element with two meanings, or the same meaning assigned two different scopes, as in the Turkish example. But, I thought it was almost an axiomatic fact of perception that a single form could not be assigned two different meanings. Not only does this apply to lexical itemsÑ"He sat by the bank" cannot mean he sat by the river and a financial institutionÑbut this applies to perception more generallyÑthis is Necker cube stuff. This would seem a prime case of putting the learner squarely behind the eight ball, yet, there it is. We have not only the case of one form with two meanings to be concerned about, but also its converse. Two (or more) forms that add up to a single meaning. One reflection of this is discontinuous morphology. So, for instance, Nida cites the Kekchi example in (6):
http://www.ling.rochester.edu/wpls/s2000n1/carlson.pdf
CarlsonÑMismatches of Form and Interpretation
6.
102
ocàcocà "house" rocàocàeÖp "their house"
French "ne...pas" would be a possible candidate for a more familiar example. But far more commonly this is found in agreement or concord forms: an agreeing plural article, two plural adjectives, and a plural noun add up to simply one plurality, not four. A definite article combined with the definite form of a noun, still add up to one definite. Multiple negations, as given in the Old English example in (7), add up to a one single negative: 7.
Ac he ne sealde nanum nytene ne nanum fisce nane sawle. and he NEG gave NEG beasts NEG NEG fish NEG souls "And he did not give beasts or fish souls"
Such examples are so familiar we might easily overlook the language learning problem: if we build a signal-detector that generates an associated meaning upon encounter with a certain type of form, we're going to get extra meanings all over the place which are not parts of the actual interpretation as best we can determine it. Note that the strategy of treating certain forms as meaningless, and localizing the meaning to just one of the forms, may work in some instances but not generally. Let's take a really simple example, the English phrase "These houses". Two plurals, so let's treat the one on the noun as "real". The problem is "These have wooden doors" has a plural subject, semantically and in all other respects, and so does "Houses have wooden doors". It also appears on occasion that sounds are not paired with meanings. We are all used to work on expletives, so I'll draw on examples from another domain, that of Classical Latin semi-deponent verbs. Latin had a productive inflectional passive marker that normally signaled passivization (i.e. the subject is semantically the direct object), but in many semideponents while the present tenses were formed from the usual active paradigms, the perfect forms required the passive morphology, but without a corresponding effect on passive meaning. Here's a textbook example in (8): 8.
audeo "I dare" ausum sum "I dared" (not, "I was dared")
Or, consider the habitual markers that appear in contrafactuals in some languages. In (9) is an example from Hindi due to Bhatt (1997): 9.
a. ??Meera do baje bhaashaN de rahii ho-tii (hai) M 2 o'clock speech give prog be-HAB (Pres) b. agar Meera kal do baje bhaashaN de rahii ho-tii... if M yesterday 2 o'clock speech give prog be-HAB "if Meera had been giving a speech yesterday at 2:00..."
Here, there is no discernible semantic contribution of the HAB marker in (9b), while in (9a) its presence makes the point-time adverbial sound strange (as
http://www.ling.rochester.edu/wpls/s2000n1/carlson.pdf
103
WPLS:UR, vol. S2000, no. 1
generalizations are often odd if given point-time readings), but not so in (9b). English pluralia tanta ("scissors", "pants"), or dependent plurals (as in "Unicycles have wheels") would be possible examples of a plural making no semantic contribution. I'll not go on, but language seems to have many instances of interpretable elements that, in given constructions, bear no such or seemingly any meaning. Or, what they can do is bear other meanings instead. An illustrative case is the Spanish "spurious se," first discussed to my knowledge in the generative literature by Perlmutter. In sequences of Spanish clitics, if the third person indirect object clitic appears before a third person direct object clitic, it is realized as 'se', which is normally taken to be a reflexive form (though of course it has other functions as well). However, the meaning is not (necessarily) reflexive: 10.
Se lo mandas. *Le/Les lo mandas. "You send it to {him/her}/them"
Again, this might at first appear a funny little fact, but forms that are, from a transformational point of view, mapped to other forms in context are extraordinarily common. Consider sequence of tense phenomena, where a past tense appears in a subordinate clause, but it has a reading contemporaneous with the interpretation of the higher tense as if, semantically, it were a present tense. In preposition/pronoun inversion in Germanic (now lost in English except in frozen forms like "thereupon" or "therefore"), a (neuter) personal pronoun seems expressed instead as a locative, as in German 'damit', 'darauf'. In Greek, we find in certain contexts imperfectives that appear to contribute perfective meaning, as in wishes and contrafactuals. Again, we're not looking at some spotty little curiosities, to my mind, but rather some features which detailed analysis and study show, and will show, recur time and again. And, finally, and I'll not dwell on this, there are wholesale instances of "silences," of null elements, or elements contrastively omitted, which mean something. We have null pronouns, null determiners, null agreement markers, null anaphoric devices of all sorts. And so we need to somehow sort through those silences that are significant, from those that are not. Let me talk for a time about a line of research in which null and expletive elements play a significant role. Many languages have as one form of a noun phrase the possibility of there being no overt determiner, quantifier, or other similar element. Languages without articles use such noun phrases extremely commonly, but most languages with both definite and indefinite articles also have them. These most commonly are restricted to mass terms and the plural form of count nouns, and in many languages are restricted in their syntactic occurrences (though not in English). From a linguistic point of view, the absence of a determiner in bare plurals and mass terms invites the notion that there is a null determiner or quantifier present-this is a very natural thing to consider. But, what semantic contribution would this null determiner make to the whole? The answer would change with the context, it appears. In some contexts, it would appear simply to be an existential, as in "I
http://www.ling.rochester.edu/wpls/s2000n1/carlson.pdf
CarlsonÑMismatches of Form and Interpretation
104
bought apples at the store" + "I bought some apples...". But in other contexts the contribution would have to be different: "Apples contain vitamin C" does not mean "Some apples contain vitamin C", but something a lot more like all, or most apples. What's emerged in the past fifteen years or so is a kind of consensus that one should not look to the empty determiner position, if there is one, as a kind of ambiguous quantifier to give a proper account. Rather, the quantificational force is gotten from other elements of meaning in the sentence that the noun phrase combines with; treating them as indefinites within a DRT framework is one way of expressing this view. Some people posit null determiner positions in such noun phrases, other don't. But no one, to my knowledge, is currently wrestling with the question of trying to systematically accord it some lexical contents. One type of fact that militates, in a general way, against the view that quantificational force should be localized in a null determiner, is the phenomenon of scopelessness. On the existential reading, and on the more general reading as well, these noun phrases do not interact scopally with other sentential elements, such as negation or other quantifiers, to produce the characteristic scopal ambiguities (Here, I'm setting aside a few widely-known exceptions). One could equally well represent these facts with a null determiner, or no determiner at all, at least at this level. Now one somewhat unfortunate side effect of this line of work has been to pass down as folklore, and I've been responsible for some of this, the idea that bare singulars "don't exist" in languages with articles. Oh, sure, there are a few, but basically "Book fell on the floor", and "I found dog" stand in contrast to "Books fell on the floor" or "I found dogs", and "I was watching television" with its bare singular is an idiom. However, a spate of more recent work (Tom Roeper, Roberto Zamparelli, and Kaja Borthen, among others) has shown the systematicity of these things and, in particular, the semantic affinity of these to bare plurals: they, too, upon close examination are scopeless. The basic facts seem to be these: bare singulars are both lexically and positionally restricted. So we have contrasts such as those in (11): 11.
a. They put him in jail/prison/*penitentiary. b. I took my son to school/college/*university (Am Eng)
Generally, they follow verbs or prepositions, but may appear occasionally as subjects of certain verbs: 12.
a. Prison has little to offer in the way of recreation. b. College is a good place to learn.
http://www.ling.rochester.edu/wpls/s2000n1/carlson.pdf
105
WPLS:UR, vol. S2000, no. 1
They may not be modified (unlike bare plurals): 13.
a. They sent him to *(big) jail. b. I watched it on television *(that had a 31" screen)
However, in conjunct cases and a couple others, the lexical restrictions are eliminated or reduced: 14.
a. University and highschool alike require much study. b. Neither television nor radio have become educational tools.
Impressionistically, these structures appear to share many of the positional constraints of bare plurals in Spanish and Italian that have been analyzed as properly governing an empty D position, and they seem to share the lexical constraints of incorporated and incorporation-like structures found in other languages. They appear to be non-referential, in the following way. Consider a situation in which Bob is watching television. There's a definite TV he is then watching, and one can refer back to that TV: e.g. by continuing "...and then he turned it off and went to bed". However, if we use this sentence as the antecedent of VP ellipsis, consider the result: 15. Bob was watching television, and Fred was, too. There is no reading of this where both had to have been watching exactly the same TV set. This is what you'd expect if "television" were treated as a narrow-scope, nonspecific indefinite. Now consider cases involving definite noun phrases. If we use VP ellipsis in such cases, identity of reference is preserved. 16.
a. Bob attended the old brown school, and Sam did, too. b. Max liked the Brecht play a lot, and Susan did, too.
Naturally. They went to the same school, liked the same play, because of the definiteness. However, certain unmodified, lexically-selected nouns seem to work differently: 17.
a. Sam is in the hospital again, and so is Mary. b. I heard about the riot on the radio, and Sharon did, too.
Sam and Mary need not be in the same hospital, and while Sharon and I heard about the same riot, our radios may well have been different, as if the phrase is indefinite. But we have this definite article, what are we to make of that?
http://www.ling.rochester.edu/wpls/s2000n1/carlson.pdf
CarlsonÑMismatches of Form and Interpretation
106
It seems many noun phrases with definite articles work this way in English, but others (such as "the riot") don't. There is a certain amount of work, by Longobardi, and Vergnaud and Zubizaretta, that introduces the notion of "expletive article," one put in to produce a noise in an otherwise empty D position. What I'm suggesting here is that in instances such as (17), there is a reading (the most natural one) where the definite article is expletive: that is, we "really" have an instance of a bare singular in each case, the semantics of which is similar to that of bare plurals. Whether this is correct or not remains to be seen. My present point is, how could one see through the complexity of language to learn such facts? There would appear to be considerable usefulness in such notions as meaningful things that occasionally mean nothing, and things like null determiners. I better wind up. From the point of view of linguistic theory, many of the things I have been talking about seem fairly unproblematic. This is because we have ideas about how to resolve many of the mismatches we find between form and meaning. Nevertheless, natural languages, from a commonsensical point of view, seem treacherously designed. We have somethings which mean nothing, and nothings which mean something. We have two things meaning one thing, and one thing meaning two things. We have things in disguise, meaning in highly constrained contexts what something else means that it normally contrasts with. We have things, even if the meaning is a single, normal-seeming meaning, that are put in the wrong place and have to be figured instead for another. But, obviously, things like this are learned by tens of millions annually. And at this point I have absolutely no sensible ideas about how this could be. In part, because the (vague) common-sense view taken here is clearly not right, and I don't know exactly what to replace it with. But there is one clear overriding reaction I do haveÑthinking about how hard a problem all this is makes me very happy I'm not a language learner. Department of Linguistics University of Rochester Rochester, NY, USA 14627
[email protected]
http://www.ling.rochester.edu/wpls/s2000n1/carlson.pdf
University of Rochester Working Papers in the Language Sciences, Vol. Spring 2000, no. 1 Katherine M. Crosswhite and Joyce McDonough (eds.)
Vowel Reduction in Russian: A Unified Accountof Standard, Dialectal, and “Dissimilative” Patterns* Katherine Margaret Crosswhite (
[email protected]) Center for the Sciences of Language Lattimore Hall, University of Rochester, Rochester New York 14627
Abstract: This paper provides an Optimality-Theoretic analysis of a number of Russian vowel reduction patterns. In particular, the analysis presented here relies on a non-unitary approach (Crosswhite 1999) to two-pattern vowel reduction systems, such as those typically seen in Russian dialects. Furthermore, a particularly complex dialectal pattern, traditionally referred to as "dissimilative" reduction, is analyzed here without use of direct featural dissimilation. Instead, constraints on sonority, lengthening under stress, and foot form conspire to allow the quality of the stressed vowel of some word to indirectly affect the surface quality of the preceding unstressed vowel.
1. Introduction: Vowel Reduction in Russian Vowel reduction is a prominent characteristic of the phonology of both Contemporary Standard Russian (CSR) and a number of Russian dialects. In this work, I will discuss several different types of vowel reduction found in the Russian language, and provide a formal analysis for them. In particular, the approach presented here allows a wide range of Russian vowel reduction patterns to be accounted for using the same basic theoretical machinery—in particular, no special mechanisms have to be introduced to account for the so-called "dissimlative" patterns of reduction found in some dialects. This contrasts with treatments such as Halle (1965), Nelson (1974), Davis (1970), and Suzuki (1998), where the dissimilative reduction patterns are analyzed as fundamentally different from the non-dissimilative reduction patterns, requiring either rule modifications specific to the dissimilative dialects, or constraints that pertain only to the *
This manuscript is a modified version of the Russian analysis in my 1999 UCLA dissertation. I would like to thank Henning Andersen, Tim Beasley, and Bruce Hayes for helpful comments and suggestions for revisions.
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
108
UR:WPLS, vol. s2000, no. 1
dissimilative environment. In the analysis presented here, the non-dissimilative reduction patterns are seen to be simply special cases of the dissimilative pattern. The formal approach taken towards vowel reduction in this work is that of Crosswhite (1999), in which two different categories of vowel reduction are posited—one based on the elimination of difficult perceptual categories in unstressed syllables (such as unstressed non-peripheral vowels), and the other based on elimination of unstressed high-sonority vowels. These two tendencies are formalized using Optimality-Theoretic constraints of two different types: licensing constraints and prominence constraints. In this respect, this article can be thought of as the Optimality-Theoretic implementation of the basic insights outlined in Jakobson's 1929 Remarques sur l'evolution phonologique de russe compar¡e celle des autres langues slaves. In Remarques, Jakobson identifies two general characteristics of the reduction patterns seen in Russian. The first is the tendency for “reduction of atonic vowels to three phonemes, the cleanest and most characteristic in terms of timbre, the 3 ‘points of the vowel triangle.’” This tendency is encoded in the current analysis using licensing constraints that limit non-peripheral vowel qualities to stressed syllables. The second generalization made by Jakobson is that to increase “the contrastiveness between stressed and unstressed vowels, there is a tendency to strengthen the first and weaken the second." This idea of the rich getting richer and the poor getting poorer is represented in Optimality Theory using prominence constraints (McCarthy and Prince 1993). Use of Optimality Theory as the theoretical framework for this analysis allows these two motivating factors to be expressed as distilled phonological ideals, or constraints—a fact that has several beneficial results. First, vowel reduction constraints based on these two phonological ideals are able to vary diametrically. In some dialects, both will be active and capable of causing surface alternations in vowel quality. In other dialects, one or the other constraint may be inactive. In yet other dialects, one or the other constraint may be blocked only in certain evironments, environments where the other constraint is not subject to any circumscription. As we shall see, all three of these situations are played out in Russian vowel reduction patterns, thus providing empirical support for the analysis provided here.
2. Data Before discussing the formal analysis for Russian vowel reduction, I will lay out the basic Russian vowel reduction pattern, as well as provide a brief account of some of the dialectal variants to be accounted for later. This section is
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
109
included to provide an overview of the empirical problem. More detailed descriptions will be presented for each of the dialectal patterns when that pattern is analyzed in the subsequent sections. Throughout this work, Russian dialectal reduction patterns will be referred to using Anglicized versions of the traditional Russian dialectological names—for more information, see Note 1 (p. 59). 2.0.1. Similarities in Reduction Patterns: Surface Sub-Inventory Not all dialects of Russian have vowel reduction. The dialects belonging to the Northern dialect group usually either lack reduction, or have only a weak form of reduction. Dialects in the Central and Southern dialects groups (including Contemporary Standard Russian (CSR), which is technically a member of the Central dialect area) are characterized by vowel reduction. Of those dialects that show vowel reduction processes, the majority show a “two-pattern” reduction process, with a moderate reduction pattern operating in the syllable that immediately precedes the stress, and an extreme reduction pattern operating in (most of) the remaining unstressed syllables. Before investigating the many and varied patterns of reduction, let us take a moment to look at the ways in which these patterns are similar. Specifically, most of these reduction patterns generate similar surface sub-inventories. In other words, many of these different dialects achieve the same ends by different means. As just mentioned, the majority of Russian dialects that have vowel reduction display two degrees of reduction. These two different degrees of reduction produce different vowel sub-inventories. Specifically, the first and more moderate degree of reduction usually occurs in the syllable that immediately precedes the stress, and usually produces the vowel sub-inventory [i,u,a]. I will refer to this type of neutralization as moderate reduction. The second and more extreme degree of reduction occurs in the remaining unstressed syllables, and usually produces the vowel sub-inventory [i,u,]. I will refer to this type of neutralization as extreme reduction. These vowel sub-inventories are illustrated in the following diagram. (Note: Here and throughout this chapter, transcriptions will not reflect subtle and/or gradient changes in vowel quality such as those that can be observed, for example, when comparing stressed and unstressed tokens of /i/ or /u/, or when considering the variants of /i/ that occur after palatalized and non-palatalized consonants.)
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
110
UR:WPLS, vol. s2000, no. 1
(1) Vowel Subinventories in Russian Dialects Other Pretonic Immediately Pretonic Stressed Syllable Syllables Syllable i u i u i u e o a a low-sonority V’s only
peripheral V’s only
all underlying V qualities
Post-tonic Syllables i u low-sonority V’s only
As noted above, the patterns of neutralization that generate these sub-inventories differ from dialect to dialect. For example, in CSR, unstressed /e/ reduces to [i] in the immediately pretonic syllable (as well as in the other unstressed syllables). In other dialects, unstressed /e/ reduces to [a] in the immediately pretonic syllable but reduces to [i] in other unstressed syllables. It is fairly constant crossdialectally, however, that barring interference from palatalized consonants, unstressed /o,a/ neutralize to [a] in the immediately pretonic syllable, but reduce to [] in other unstressed syllables. In the so-called “dissimilative” vowel reduction patterns, which are found predominantly in dialects of the south and south-western regions of the Russian folk-dialect area, the surface sub-inventories differ from the pattern already described. In these dialects, the two-pattern reduction system utilizing both moderate and extreme neutralizations holds only for certain words. In the remaining words, only extreme reduction is found—that is, the immediately pretonic syllable in such words is subject to extreme rather than moderate reduction. Any given word will predictably fall into either one group or the other based on the quality of the stressed vowel. If the stressed vowel is relatively low in sonority, the two-pattern system will hold. If the stressed vowel is realtively high in sonority, the modete neutralization pattern that would otherwise be expected in the immediately pretonic syllable will not show up. There are several variations on this pattern. The main parameter for variation concerns precisely which vowels are considered "high in sonority" and which "low in sonority." One of the attested patterns is illustrated below. (As illustrated, many of these dialects have 6- or 7-vowel systems under stress.)1
1
Fans of Russian dialectology will note that I do not provide a treatment here of either assimilative or assimilative-dissimilative Russian vowel reduction. Based on the instrumental observations of Kasatkina and Shchigel’ (1995), it seems as though the “assimilative” part of assimilative-dissimilative vowel reduction is truly featural assimilation. Since I do not analyze dissimilative reduction using
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
111
(2) Vowel Sub-Inventories: dissimilative Russian Dialects Other Pretonic Immediately Stressed Post-tonic Syllables Syllable Syllables Pretonic σ i u i u i u i u words with a stressed a high vowel
low-sonority V’s only
i words with a stressed non-high vowel
u
peripheral V’s only
i
high V’s only (by definition)
u
low-sonority V’s only
low-sonority V’s only
low-sonority V’s only
i e,(
o,o
u
a non-high V’s only (by def.)
low-sonority V’s only
The name “dissimilative” comes from the observation that the reduction vowel [a] cannot be used in the immediately pretonic syllable if the vowel under stress is also [a]. Currency of the term “dissimilative” may have been enhanced by the existence of assimilative vowel reduction patterns in other dialects (which will not be analyzed here--see fn. 1). The existence of both assimilatory and dissimilatory variants of a given phenomenon makes for an appealingly symmetrical classificatory system. I will argue, however, that dissimilative vowel reduction does not in fact involve any direct interaction between the vowels of the tonic and immediately pretonic syllables. This being the case, the name dissimilative is perhaps misleading, since the formal analysis does not make use of featural dissimilation. I will continue to use the traditional dialectological name Dissimilative—capitalization of the term indicates that it is simply a name, not a description. It should not be taken as indicative of the formal analysis of that pattern any more so than would the other traditional dialectological names used in this work (i.e., Obojan, Don, Sudzha, okan’e, etc.). The variant illustrated above is referred to as Don or Belgorod Dissimilative reduction. In other variants of the Dissimilative pattern, the stressed vowels group differently with respect to either triggering of blocking the twopattern reduction system—but the groupings are always based on sonority. Additionally, the Dissimilative pattern can be affected by the palatality of the consonants surrounding a given unstressed vowel, generating Dissimilative featural dissimilation (cf. section 3. ), this does not make for a contradictory state of affairs.
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
112
UR:WPLS, vol. s2000, no. 1
dialects where the two-pattern system is blocked in contexts containing a palatalized consonant, or where two different variants of the Dissimilative pattern occur—one in contexts that have palatalized consonants, and the second in contexts lacking them. These variants will be discussed and analyzed in more detail in section 3.1. In the following section, I will give a brief overview of the methods of neutralization that actually generate the sub-inventories presented above. 2.0.2. Vowel Neutralization in Non-Immediately-Pretonic Unstressed Syllables The neutralization processes found in the non-immediately-pretonic syllable (i.e., extreme reduction) show little variation, compared to the variety of neutralizations that are seen in the immediately-pretonic syllable. One question surrounding the neutralization processes seen in Russian extreme reduction, however, surrounds the status of unstressed /e/. Namely, it is sometimes supposed that the reduction of unstressed /e/ to [i] is due to the influence of palatalized consonants, since /e/ is almost exclusively found after a palatalized consonant. This does not seem to be the case for at least those dialects where relevant data is available. Therefore, I will treat the reduction of Russian /e/ to [i] as an independent reduction pattern (i.e., not due to surrounding consonantal environment). For more detailed discussion of this point, please see Note 2 (p. 60). With this in mind, we can summarize vowel neutralization patterns in the non-immediately-pretonic syllables as illustrated below. Example forms from CSR are provided. (3) Extreme Neutralizations, common to most dialects with reduction After Non-Palatalized i (e)
a
After Palatalized
u
i
u
o
e
o a
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
After Non-Palatalized /tsexovój/ [tesixavój] ‘shop’ (adj.) cf. [teséx] ‘shop’ /sadovód/ [sdavót] ‘gardener’ cf. [sát] ‘garden’ /gorodók/ [gradok] ‘little city’ cf. [górt] ‘city’
113
/r-ete6-ovój/ /p-atote6-ók/ /t-oplotá/
After Palatalized [r-ite6-ivój] ‘speech’ (adj.) cf. [r-é te6-] ‘speech’ (n.) [p-itate6-ók] ‘five kopeck coin’ cf. [p-át-] ‘five’ [t-iplatá] ‘warmth’ cf. [t-óplij] ‘warm’
The vowel /e/ is shown in parentheses in the illustration above (in the context representing reduction after a non-palatalized consonant) since it is not clear if this portion of the process can be generalized to all dialects. On the question of reduction of unstressed /e/ after non-palatalized consonants, as well as after palatalized consonants, see Note 2 (p. 60). In summary, the vowel neutralization patterns seen in the nonimmediately-pretonic unstressed syllables in Russian dialects characteristically avoid the occurrence of high-sonority mid and low vowels, which typically surface as low-sonority [] (after non-palatalized consonants) or [i] (after palatalized consonants or for underlying /e/). Althought this pattern of extreme reduction apprears to be very widespread, a variant pattern for has been described by Avanesov (1984) in which unstressed /e/ surfaces unreduced. This is described as characteristic of certain speakers of the "Old Muscovite" dialect. See section 3.1.4 for further discussion. 2.0.3. Vowel Neutralization Patterns in Immediately-Pretonic Syllables— NonDissimilative variants The vowel neutralization patterns found in the immediately pretonic syllables in Russian dialects show more variety than the pattern discussed above. Generally, the vowel reduction patterns found in immediately pretonic syllables can use more sonorous reduction vowels than those found in other unstressed syllables. 2.0.3.1. The [a]-reduction Pattern of Moderate Neutralization The pattern that is generally taken to be the most basic or "default" pattern is one in which all non-high vowels reduce to [a] in the immediately pretonic
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
114
UR:WPLS, vol. s2000, no. 1
syllable, regardless of the palatality of the preceding or following consonant. Traditionally, this pattern is referred to as akan'e (roughly, "saying [a]"); I shall refer to this pattern as [a]-reduction. This pattern is illustrated below, along with some example forms illustrating the appropriate alternations. (Here, /e/ is not listed in the environment after a non-palatalized consonant since data establishing the occurrence of /e/ in that context is not available for these dialects.) (4) Moderate Neutralization via [a]-reduction Immediately Pretonic After Non-Palatalized
Immediately Pretonic After Palatalized
i
u
i
u
o
e
o
a /r-eká/ /p-atí/ /n-osú/
[r-aká] ‘river’ [p-atí] ‘five’ (gen. sg.) [n-asú] ‘I carry’
a cf. [r-ét6k] ‘little river’ cf. [p-át-] ‘five’ (nom. sg.) cf. [n-ós] ‘he carried’
2.0.3.2. Other Forms of Moderate Reduction Although [a]-reduction is usually taken as the original moderate reduction pattern in Russian, it should be pointed out that a number of other moderate reduction patterns are widely attested. In particular, additional patterns of moderate reduction might use additional reduction vowels (such as [e]), and might be sensitive to the presence of palatalized consonants on one or both sides of the vowel in the immediately pretonic syllable. Each of the moderate reduction patterns that will be addressed in this work is listed below, along with a brief description.
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
115
• [i]-reduction: In the immediately pretonic syllable, /a,o/ reduce to [i] if the preceding consonant is palatalized. (/e/ always reduces to [i]) • [e]-reduction: In the immediately pretonic syllable, /e/ does not reduce and instead surfaces as [e]. Additionally, /o,a/ in the immediately pretonic syllable reduce to [e] if there is a preceding palatalized consonant. • attenuated [a]-reduction: In the immediately pretonic syllable, /o,a,e/ reduce to [a], unless flanked on both sides by palatalized consonants. In the doubly-flanked environment C-__C-, the vowels /o,a,e/ reduce to [i]. (Does not affect contexts of extreme reduction, where reduction to [i] does not require the double-sided environment.) • incomplete reduction:2 The vowel in the immediately pretonic syllable does not reduce. (Does not affect contexts of extreme reduction.)
3. Analysis In Crosswhite (1999), the general approach towards two-pattern vowel reduction phenomena is as follows: Moderate reduction occurs in all unstressed syllables, and is motivated by licensing constraints. Extreme reduction occurs in a subset of unstressed syllables, and is caused by prominence constraints. The context in which extreme reduction pertains is represented moraically—extreme reduction affects those unstressed syllables which are nonmoraic. Since stressed syllables are obligated to be moraic, these environments constitute a set~subset relation, and a two-pattern reduction system will only occur if the subset constraint (prominence reduction, causing the “extreme” neutralizations) outranks the more general constraint (contrast enhancement, causing the “moderate” neutralizations). This also predicts, correctly, that extreme reduction will occur in the intersection of these two sets (the subset), while only moderate reduction will occur in the complement. When applied to the Russian vowel reduction patterns sketched above, this approach provides a good fit to the data, capturing all the necessary empirical facts. In addition, some of the dialectal variants offer empirical support for this sort of two-pronged approach. Namely, certain dialects are aptly described as resulting from grammars where some constraint(s) must intervene between the 2
Traditionally referred to as incomplete okan'e. The term okan'e refers to the lack of reduction ("saying [o]" in unstressed position). Incomplete okan'e therefore refers to a partial lack of reduction: reduction does not affect the immediately pretonic syllable.
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
116
UR:WPLS, vol. s2000, no. 1
two vowel reduction constraints, or where one of the vowel reduction constraints is absent—a result that is only possible if there are two orthogonal vowel reduction constraints in the grammar. I will start by discussing extreme reduction, which is caused in this analysis by a prominence reduction constraint. The first step in analyzing this pattern is to isolate the environment in which extreme reduction occurs. I will argue that in Russian, extreme reduction strikes unfooted, nonmoraic syllables. 3.0.1. Extreme Reduction and Russian Foot Form As laid out above, Russian vowel reduction shows a moderate neutralization pattern in the immediately pretonic syllable, and an extreme neutralization pattern in other unstressed syllables. In the analysis provided here, I will account for this fact by analyzing these two syllables as constituting a prosodic domain—a foot. This foot structure has previously been proposed for Russian by Halle and Vergnaud (1987) and Alderete (1995). The proposed foot structure is right-prominent: (σσ½), suggesting that Russian is an iambic language. In accordance with Prince and Smolensky (1993), I will conclude that Russian uses the constraint RHTYPE=IAMB. It is important to note that distinguishing between the immediately pretonic syllable and the other unstressed syllables is necessary not only for Russian vowel reduction, but for Russian word prosody as well. For example, the unstressed vowel in the immediately pretonic syllable in many Russian dialects is durationally distinct from other unstressed vowels of the same quality. Furthermore, although unstressed vowels in Russian are frequently devoiced or deleted in fast speech, the vowel of the immeidately pretonic syllable is not— according to Zemskaja (1987, p. 201), vowel deletion is most common for the unstressed vowel immediately following the stressed syllable, and next most common for the vowel in the 2nd pretonic syllable. In other words, effacement of unstressed vowels is most likely in those unstressed syllables immediately adjacent to the proposed iambic foot. It should be noted, however, that this foot form is not common to all the Russian dialects. Research by Vysotskii (1973) and Almukhamedova and Kul’sharipova (1980) reveal the existence of various dialectal rhythmical variants. As pointed out by Kasatkina (1996), all of these variants can be grouped into two large categories: the “strong center and weak periphery” group and the “wave contour” group. As suggested by the names, the “strong center and weak periphery” rhythmic pattern is characterized by increased duration of the tonic and immediately pretonic syllables (which constitute the “strong center”) and
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
117
decreased duration for all remaining syllables (the “weak periphery”). Kasatkina (1996) suggests that this prosodic pattern is a defining characteristic of the central Russian dialect area, to which Contemporary Standard Russian (CSR) belongs. The “wave” rhythmical pattern is characterized by increased duration for the stressed vowel, with lengthening also occurring for syllables removed by one syllable from the stress; the syllables immediately adjacent to the stress are short. Almukhamedova and Kul’sharipova (1980, p. 47) observe this rhythmic lengthening pattern in north Russian dialects without vowel reduction, and note that this sort of rhythmic organization is similar to that of Ukrainian and may be a remnant of a previous prosodic system. Importantly, these different rhythmical patterns are found in areas with different vowel reduction behaviors: the strong center and weak periphery pattern predominates in the central Russian dialect area, whose members usually show moderate or no reduction in the immediately pretonic syllable, but extreme reduction in the remaining unstressed syllables; the wave pattern is found in the north Russian dialect area, whose members typically lack significant vowel reduction. It seems reasonable to suppose, therefore, that the conditioning environment for moderate vowel reduction is tied to foot form: dialects with moderate reduction in the immediately pretonic syllable use the foot form (σσ½). To account for the fact that the foot form of the central Russian dialects has such a profound effect of the duration of unfooted vowels, I will make the following claim: the footed syllables of Russian are moraic, while the unfooted syllables are nonmoraic. We can say, for example, that the moraic (footed) vowels of Russian are guaranteed to attain a certain minimum duration, since they possess timing units (moras). The nonmoraic (unfooted) syllables, however, are not guaranteed any minimum duration since they lack timing units—this might mean realization of a nonmoraic vowel as very short, deleted, devoiced, or (as described for extremely reduced Russian vowels in Bondarko et al. 1966, p. 63) as a vowel that is highly overlapped with the preceding consonant. Formally, the moraic distribution described above for Russian can be derived using the following constraints: *STRUC-µ: Moras do not occur in output forms. CULMINATIVITY: A prosodic word has exactly one stress. FTBINµ: Feet have at least two moras. The constraint *Struc-µ is a structure avoidance constraint. It assigns one violation mark for every mora that occurs in an output candidate. Culminativity
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
118
UR:WPLS, vol. s2000, no. 1
assigns one violation mark to any output candidate that does not have exactly one stress. The FtBinµ constraint is a familiar binarity constraint that demands all feet have two moras: It assigns one violation mark to any foot in an output form that does not have at least two moras. The appropriate moraic distribution is achieved in Russian by ranking Culminativity and FtBinµ above *Struc-µ, as shown in the following tableau: (5) Deriving Foot Structure: Culminativity, FtBinµ » *Struc-µ /σσσσ½σσ/
)
CULMINA-
FTBINµ
*STRUC-µ
comments:
*! *!
** ****** * (
winner too many moras foot isn’t binary foot isn’t binary no stress too many stresses
TIVITY
σσ(σµ σ½µ)σσ σµσµ(σµ σ½µ)σµσµ σσ (σσ½µ)σσ σσσ(σ½µ)σσ σσσσσσ (σµσµ)(σµσ½µ)(σµσµ)
*! *!
******
As shown in this tableau, the combination of FtBinµ and *Struc-µ conspires to exclude all but two moras from the winning output form: In other words, the winning candidate is the one that has as few unstressed moras as possible without violating the two higher-ranking constraints. Before moving on, it should be noted that at this point it is difficult or impossible to determine that moraicity is the critical factor in deciding where extreme reduction and moderate reduction apply. Based on the vowel reduction facts discussed so far, the different distribution of extreme vs. moderate vowel reduction in Russian could be described in terms of footedness vs. nonfootedness. There are, however, certain exceptions to the pattern already described—these exceptions can be expressed in terms of moraicity, but not footedness. For example, unstressed /a,o/ undergo moderate reduction when they occur in unstressed position at the extreme left edge of the prosodic word—regardless of the distance between that syllable and the stressed syllable. For example, forms like /ogoród/ ‘vegetable garden’ and /antropológija/ ‘anthropology’ are pronounced [agarót] and [antrpalóg-ij], respectively. Note that the initial vowels reduce to [a] and not [], even though they are not immediately pretonic: Extreme reduction has been blocked. This blockage cannot be the result of a foot, since there is no secondary stress on these vowels. Furthermore, mere extension of the main stress foot to include the word-initial vowels cannot be a possibility,
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
119
since such a structure would predict that all vowels intervening between the first vowel and the stressed vowel would also be subject to moderate reduction. The form [antrpalóg-ij] shows that this is not the case. There is nothing, however, that would prevent these vowels from being moraic. In fact, the duration of wordinitial unstressed vowels is increased (Zlatoustova 1981), and such vowels are not subject to the deletion and devoicing phenomena observed with nonmoraic unstressed vowels in Russian. An alignment constraint can derive this effect: Align-µ: The left edge of every word must align with some mora. Assuming that onset consonants are barred from being moraic, this constraint will enforce the presence of a word-initial mora only in those cases when the first segment of a word is a vowel. The moraic basis for the distribution of extreme vs. moderate vowel reduction is also supported by evidence from European and Brazilian Portuguese (Brakel 1985, Carvalho 1988-92). This evidence is discussed in more detail in section 4.0.1 3.0.2. Extreme Reduction as Prominence Reduction Given the moraic distribution discussed for Russian in the preceding section, the constraint that motivates extreme vowel reduction can now be introduced:3 *Nonmoraic/-high: Nonmoraic vowels may not have a sonority greater than that of i,u. Here, vowel sonority is defined based on inherent duration and/or jaw position. According to these criteria, [] is the least sonorous vowel, and [i,u] are the next most sonorous. This means that the *Nonmoraic/-high constraint will assign one violation mark to any surface nonmoraic vowel that is not [i], [u], or []. As discussed in section 2.0.2, the neutralizations that are used to avoid violation of *Nonmoraic/-high are different for underlying /o,a/ on the one hand and 3
This constraint is formally derived using Prince and Smolensky's prominencealignment mechanism (Prince and Smolensky 1993). Prominence alignment formally produces a ranked family of prominence constraints. Here, since no constraints need to be interleaved between the members topmost members of this constraint family, I am "encapsulating" these into a single constraint for ease of presentation.
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
120
UR:WPLS, vol. s2000, no. 1
underlying /e/ on the other: /a,o/ reduce to [] under extreme reduction (barring presence of a palatalized consonant), while /e/ reduces to [i]. The two following tableaux illustrate extreme reduction of nonmoraic /o,a/ to []. Note: only violations for the unfooted unstressed vowel are considered in this tableau. (6) Extreme Reduction for /o,a/: *Nonmoraic/-high » Dep[+high] *NONMORAIC/ -high
a. b. c. d. e. f.
/domovój/ ‘house spirit’ d(mavój) du(mavój) di(mavój) da(mavój) do(mavój) de(mavój)
*NONMORAIC/ -high
g. h. i. j. k. l.
/sadovód/ ‘gardener’ s(davót) su(davót) si(davót) sa(davót) so(davót) se(davót)
)
)
DEP +HI
*! *! *! *! *! DEP +HI
*! *! *! *! *!
As shown in the these tableaux, the ranking of *Nonmoraic/-high above Dep[+hi] produces the correct neutralization pattern for both nonmoraic /o/ and /a/. The *Nonmoraic/-high constraint rules out all candidates with sonorous vowels in nonmoraic position (candidates d-f and j-l). Of the remaining candidates, the []reduced forms (candidates a and g) are the winners because they do not involve insertion of a [+hi] feature specification. The candidates with high vowels (candidates c,d, h,i) do involve insertion of [+hi], and are therefore ruled out by Dep[+hi]. Now let’s consider the reduction of nonmoraic /e/ in Russian. Recall that nonmoraic /e/ does not follow a centralizing reduction pattern: instead of reducing to [], nonmoraic /e/ reduces to [i]:
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
121
(7) Extreme Reduction for /e/: *Nonmoraic/-high and Max[+front] » Dep[+high] *NONMORAIC/ /tesexovój/ -high ‘(factory) shop’ (adj.) tesi(xavój) tes(xavój) tesu(xavój) *! tese(xavój) *! teso(xavój) *! tesa(xavój)
MAX[+FT]
DEP +HI
)
* *! *!
*
* *
As demonstrated here, reduction via raising is derived for underlying /e/ due to the constraint Max[+front], which dominates Dep[+hi]. In other words, the []-reduced form is unacceptable here since it involves sacrifice of the underlying frontness of the unstressed /e/. Reduction via raising is therefore the best option. Since /o,a/ are not underlyingly front, the constraint Max[+front] has no effect on the reduction of those vowels. Finally, extreme reduction of /o,a/ after a palatalized consonant produces [i] instead of []. I will account for this effect using the following positional markedness constraint: C-/[+front]: In unstressed syllables, a palatalized consonant must be followed by a [+front] vowel. In effect, the C-/[+front] constraint is a type of licensing constraint that applies over strings of segments, rather than over single segments. In this respect, the C-/[+front] constraint can be described as a position-specific sequential grounding constraint such as those discussed by Suzuki (1991). In other words, the C-/[+front] constraint expresses the preference not to have the strings C-, C-a, etc. in unstressed position. This constraint is perceptually motivated: Russian palatalized consonants are marked by a [j]-like off glide. In stressed positions, this gives a following non-front vowel a dipthongal character—the first portion of a following non-front vowel is obscured by the palatalization of the preceding consonant, with the underlying non-palatality of the vowel only emerging later. In unstressed positions where vowels are briefer, there may not be enough
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
122
UR:WPLS, vol. s2000, no. 1
duration to convey both the phonemic palatalization of a palatalized consonant and the phonemic non-palatality of the unstressed vowel. The C-/[+front] constraint applies pressure to resolve this conflict in favor of the palatalized consonant. (It should be noted, however, that the interaction between vowel reduction and consonant palatalization is somewhat more complicated than represented here, especially as concerns underlying /o/. For a more detailed discussion of this relationship, please see Note 3 on p. 61.) The ranking of the C-/[+front] constraint is demonstrated below: (8) Extreme Reduction after a Palatalized Consonant: C-[+ft] » Dep[+high] C-/[+FT] DEP +HI *NONMAX[+FT] MAX-HI /t-oploxod/ / MORAIC ‘motorized ship’ -high * * t-i(plaxót) * *! t-(plaxót) * *! * t-u(plaxót) *! t-e(plaxót) *! * t-o(plaxót) *! * t-a(plaxót)
)
/te6-astotá/ ‘frequency’
) te6-i(statá)
te6-(statá) te6-u(statá) te6-o(statá) te6-e(statá) te6-a(statá)
*NONMORAIC/
MAX[+FT]
MAX-HI
C-/[+FT]
* * *
*! *!
DEP +HI
-high
*! *! *!
Assuming that underlyingly palatalized consonants are specified [+front], the ranking Max[+Front] » C-/[+front] will prevent de-palatalization of the consonant when followed by a non-front vowel underlyingly. Also note that in these tableaux the constraint Max[-high] has been added, although it does not affect the choice of the winning candidate. Furthermore, the evidence provided in these tableaux does not give us enough information to determine its ranking with respect to the C-/[+front] constraint, although we do know that it must be dominated by the vowel reduction constraint *Nonmoraic/-high (otherwise it
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
123
would block reduction). The ranking of Max[-high] with respect to C-/[+front] will be discussed as it pertains to moderate reduction in subsequent sections, where it will be shown that the ranking of these two constraints varies dialectally and causes variation in moderate neutralization patterns. 3.0.3. Extreme Reduction in Dissimilative Dialects The analysis of extreme reduction in dialects with the Dissimilative pattern is similar to the situation laid out in the preceding section. The operative difference is that extreme reduction has a wider sphere of application in the dialects with Dissimilative reduction: the immediately pretonic syllable sometimes undergoes extreme vowel reduction instead of moderate vowel reduction. In addition to the different distribution of extreme vs. moderate reduction, the Dissimilative dialects are also set apart by their rhythmic pattern. Recall that the occurrence of moderate reduction in the immediately pretonic syllable is associated with the “strong center and weak periphery” rhythmic pattern described in section 3.0.1 above. In the Dissimilative dialects, the “strong center and weak periphery” pattern is only found in that subset of words that retain a two-pattern reduction system. (Kasatkina 1996, Kasatkin et al. 1989). To put it another way, the immediately pretonic syllable is parsed as part of the foot when the stressed vowel is low in sonority. In words where the stressed vowel is high in sonority, the two-pattern reduction system does not surface, and the immediately pretonic syllable experiences extreme reduction. In other words, if the stressed vowel is high in sonority, the immediately pretonic syllable is not included as part of the foot. This being the case, we can claim that the different distribution of extreme and moderate reduction in the Dissimilative dialects is caused by the fact that different words (predictably) place foot boundaries in different locations. Furthermore, as noted in section 2.0.2, different variants of the Dissimilative pattern classify stressed vowel qualities differently with respect to their sonority. In one pattern of Dissimilative reduction, all non-high vowels are considered "high sonority", and therefore block occurrence of the two-pattern reduction system. This pattern is historically referred to as the Don Dissimilative pattern.4 Other basic Dissimilative variants include the Zhizdra and Obojan patterns, summarized below. 4
The name "Don" traditionally refers specifically to the occurrence of this pattern after palatalized consonants. Ward (1985) suggests the name "Belgorod" to refer
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
124
UR:WPLS, vol. s2000, no. 1
(9) Types of Dissimilative Reduction → → Obojan pattern: immediately pretonic /a,e,o/ → → Zhizdra pattern: immediately pretonic /a,e,o/ → →
Don pattern: immediately pretonic /a,e,o/
[a] / ___ í,ú [,i] / ___ é,ó,(½,o½,á [a] / ___ í,ú, é,ó [,i] / ___ (½,o½,á [a] / ___ í,ú,é,ó,(½,o½ [,i] / ___ á
Realization of Unstressed /a,e,o/ immediately pretonically reduction to a
Vowels Under Stress Don
Obojan Zhizdra reduction to i,
i e (
u o o a
In these dialects, certain stressed vowels condition the appearance of extreme reduction in the immediately pretonic syllable, or in other words, certain stressed vowels condition the appearance of a monosyllabic foot (i.e., a foot that does not include the immediately pretonic syllable). Broadly speaking, the vowels that condition this occurrence can be described as the sonorous vowels of that dialect. The three different sub-types illustrated above vary with respect to which vowels are considered sonorous enough to have this effect: in the Zhizdra pattern, only the highest sonority vowel [á] conditions a monosyllabic foot as revealed by lack of the two-pattern reduction system; in the Obojan pattern low vowels and lax mid vowels pattern together in this behavior [á,(½,o½]; and in the Don pattern all the non-low vowels do [á,(½,o½,é,ó]. Put another way, in the Zhizdra pattern (for example), a syllable with stressed [á] is capable of being footed alone, while a syllable with some other stressed vowel must be footed in conjunction with the preceding syllable: for purposes of building feet, a stressed [á] is equivalent to [é] plus another vowel, [í] plus another vowel, or any other non-low stressed vowel plus another vowel. This is shown schematically below. A period stands for a syllable boundary, and square brackets indicate foot boundaries:
specifically to occurrence of this pattern after non-palatalized consonants. I will use the more widespread term "Don" to refer to this pattern, regardless of consonantal environment.
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
125
(10) Foot Equivalences in Dissimilative Dialects Zhizdra
[Cá]
=
Obojan Don
[Cá] [C(½] [Co½] [Cá] [C(½] [Co½] [Cé] [Có]
= =
[CV.C(½] [CV.Co½] [CV.Cé] [CV.Có] [CV.Cí] [CV.Cú] [CV.Cé] [CV.Có] [CV.Cí] [CV.Cú] [CV.Cí] [CV.Cú]
This brings to mind classical weight equivalence phenomena, such as that in Latin where a single long vowel (VÛ) is equivalent in weight to two short vowels (VV) or a short vowel plus a coda consonant (VC). In Russian dialects there are no phonemic length contrasts, but assuming (following works such as Repetti 1989) that phonological phenomena can introduce vowels with varying mora counts at the surface level even in languages that do not underlying contrast long and short vowels, the Dissimilative variants described above can be accounted for moraically. That is, I analyze the monosyllabic feet displayed in (10) as containing a single bimoraic vowel, and the disyllabic feet as containing two monomoraic vowels. For example, in dialects displaying the Zhizdra pattern, a stressed [á] is structurally bimoraic, while stressed [(½, o½, é,ó,í,ú] are monomoraic: [Cáµµ] vs. [CVµCéµ].5 This result seems phonetically plausible since inherent duration differences (i.e., sonority-based differences in duration) are quite significant in Russian, and since Russian stress has a large duration-based component. Given these two factors, Russian vowels that are both stressed and high in sonority are particularly long. Assuming, following the works of Hubbard (1995) and Broselow, Chen, and Huffman (1997), that moraicity is concretely (if not straightforwardly) linked with phonetic duration, it seems plausible that language learners could interpret these stressed high-sonority vowels as structurally bimoraic. The different Dissimilative variants can be derived by placing limitations on which vowel qualities can lengthen under stress. As predicted by Prince and Smolensky’s (1993) prominence alignment mechanism, the vowels that are most likely to lengthen are those that are segmentally prominent (sonorous). The appropriate constraints for generating this pattern are shown below: 5
It should be noted that stressed [á] is quite longer in these dialects than unstressed (i.e., moderately reduced) [a]. However, it should also be noted that this is the case in most dialects, since Russian stress is duration-based (Zlatoustova 1981). Since inherent vowel duration differences are quite striking in Russian, it is not suprising that stressed vs. unstressed duration differences are most pronounced with high-sonority stressed vowels and their unstressed counterparts.
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
126
UR:WPLS, vol. s2000, no. 1
Prominence Alignment Constraints: *µµ/i,u » *µµ/e,o » *µµ/(,o » *µµ/a,4 As described by Prince and Smolensky, prominence alignment constraint families like the one shown above are produced by “crossing” two phonetic scales. The constraint family shown above was produced by crossing a moraic prominence scale with segmental prominence. Note that the symbol “»” means “dominates” and is used with constraints, while the symbol “‹” means “is less prominent than”, and is not used with constraints. Moraic Prominence: Segmental Prominence:
µ ‹ µµ “1 mora is less prominent than 2.” i,u ‹ e,o ‹ (,o ‹ a,4 “Low sonority vowels are less prominent than higher sonority ones.”
Since these scales are arranged from low sonority to high, the constraint family that results from crossing them is a “prominence reduction” constraint hierarchy, and defines the type of vowels that are not sonorous enough to co-occur with a bimoraic level of prominence.6 That is, a constraint like *µµ/i,u expresses the notion that high vowels are not sonorous enough to be bimoraic. By interleaving the members of the *µµ/X constraint family with an additional constraint, it is possible to derive the differences in foot structure observed in the three basic Dissimilative dialects (Don, Obojan, Zhizdra). The constraint that must be used is the Weight-to-Stress Principle (WSP) (Smolensky, 1993). The version of WSP used here is formulated as follows: WSP: Stressed vowels should be bimoraic. This constraint, if given full rein, would cause lengthening of all stressed vowels. However, its sphere of influence will be limited by the *µµ/X constraint family discussed above. Specifically, any *µµ/X constraint that dominates WSP will block vowel lengthening for its specific vowel quality. For example, if *µµ/i,u outranks WSP, then stressed high vowels will not be able to lengthen under stress. Similarly, if all the *µµ/X constraints except *µµ/a outrank WSP (as shown below), then only low vowels will lengthen under stress: *µµ/i 6
ª
WSP » *µµ/e ª *µµ/( ª *µµ/a
The term prominence reduction was coined by Jian-King (19xx).
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
127
The following tableau illustrates how the ranking shown above derives the correct foot boundary placement for the Zhizdra pattern. Note: in these and subsequent tableaux, only the relevant portion of the *µµ/X constraint family will be shown, due to space considerations. Tableau (11): Lengthening of Stressed [á] Due to WSP and *µµ/a (Zhizdra)
/luná/ ‘moon’
)
lu(náµµ) (luµnáµ) (luµnáµµ)
Words with Stressed Low Vowel WSP *Struc-µ *µµ/( ** ** *! ***!
Words with Stressed Non-Low Vowels /lute6-qoqk/ WSP *Struc-µ *µµ/( ** * (luµte6-oqµk) ** *! lu(te6-o½µµk) ***! (luµte6-oqµµk)
)
*µµ/a
*µµ/a
In the first tableau, the optimal output lu[náµµ] shows lengthening of the tonic vowel [a]. The second candidate, *[luµnáµ], without lengthening of the tonic vowel, is ruled out because it violates WSP. In addition, the final candidate, *[luµnáµµ] shows that the immediately pretonic syllable must be left unfooted when the tonic vowel undergoes lengthening, in order to avoid excessive violation of *Struc-µ In the second tableau, the optimal output [luµt6-oqµk] does not have lengthening of the tonic vowel. Lengthening of the tonic vowel would cause a fatal violation—either a fatal violation of *µµ/( (as shown in the second row), or a fatal violation of *Struc-µ (as shown in the third row). By changing the ranking of WSP with respect to the *µµ/X constraint family, the Obojan and Don patterns can also be derived. Additionally, by ranking WSP below the entire *µµ/X family, a di-syllabic foot shape will always result, since no vowel qualities will be able to lengthen—this is the type of pattern that is seen in the non-dissimilative dialects (including CSR).
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
128
UR:WPLS, vol. s2000, no. 1
(12) Possible Rankings for WSP, and Resulting Reduction Patterns Ranking *µµ/i,u » *µµ/e,o » *µµ/(,o » *µµ/a » WSP *µµ/i,u » *µµ/e,o » *µµ/( » WSP » *µµ/a *µµ/i,u » *µµ/e,o » WSP » *µµ/(,o » *µµ/a *µµ/i,u » WSP »*µµ/e,o » *µµ/(,o » *µµ/a
Pattern Non-Dissimilative Zhizdra Obojan Don
At this point, it should be noted that the *µµ/X and WSP constraints need to be dominated by faithfulness constraints for vowel height—otherwise, changes in vowel quality might be expected in order to satisfy the higher-ranking *µµ/X constraints while still satisfying WSP. This can be avoided by ranking the faithfulness constraint Max [+Hi] and Dep [+Low] above the *µµ/X constraints, as shown in the following tableaux: Tableau (13): Avoidance of Quality Changes to Satisfy *µµ/X and WSP Constraints /n-oqs/ ‘he carried’ n-oqÛs n-áÛs
Dep +Lo
/=ízn-/ ‘life’
Dep +Lo
)
) =íÛzn-
=éÛzn=(qÛzn=áÛzn-
Max +Hi
*µµ/i
*µµ/e
*µµ/(
*µµ/a
* *!
* Max +Hi
*µµ/i
*µµ/e
*µµ/(
*µµ/a
* *! *!
*
*!
* *
In the first tableau, lowering of underlying /o/ to [a] is blocked by Dep[+low]— without this ranking, we would expect the incorrect output candidate *[n-aÛs] to emerge as the winner, since it violates a less-highly ranked *µµ/X constraint. Similarly, in the second tableau, lowering of input /i/ is also blocked by the faithfulness constraints. Here, a number of lowering possibilities are considered—each is ruled out by either Dep[+low] or Max[+high].
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
129
By using the WSP and *µµ/X constraint families to derive the correct foot boundaries for the Dissimilative reduction patterns as well as the nondissimilative reduction patterns, the same ranking of *Nonmoraic/-high will correctly derive extreme reduction in both types of dialect.
3.1. Analyzing Moderate Reduction Now that extreme vowel reduction is accounted for, I will turn towards the analysis of moderate reduction. Recall that in the current approach, moderate vowel reduction will occur in moraic unstressed syllables, where it generates a vowel sub-inventory containing only the peripheral vowels [i,u,a] in the output. To account for this fact, I will propose the following licensing constraint: LIC NONPERIPH/STRESS: A nonperipheral vowel may not occur in the output unless under stress. Note that this constraint does not refer to moraicity. Instead, it applies to all unstressed vowels. However, since nonmoraic unstressed syllables are also subject to *Nonmoraic/-high, the effects of the Lic-Nonperiph/Stress constraint will only be visible in the complement of these two sets, viz., in moraic unstressed syllables. To avoid violation of the Lic-Nonperiph/Stress constraint, unstressed mid vowels will have to either raise to the high peripheral vowels [i,u] or lower to the peripheral vowel [a]. As explained previously, different dialects choose differently in this respect. I will begin with an analysis of [a]-reduction below, along with a discussion concerning the combination the analyses for moderate and extreme reduction. Afterwards, I will work through the other types of moderate reduction described in section 2.0.3. 3.1.1. Moderate Neutralizations in [a]-reduction In [a]-reduction, unstressed /e,o/ both reduce to [a] in the immediatelypretonic syllable, regardless of the palatality of the preceding consonant. This neutralization pattern is observed in many south Russian dialects, including those displaying the Dissimilative variants discussed above. (For this reason, they are traditionally referred to as dissimilative [a]-reduction dialects). In order to derive reduction via lowering, the faithfulness constraint Dep[+low] must be dominated both by Lic-Nonperiph/Stress and Max[-high]. This is demonstrated in the following tableau for reduction after a non-palatalized consonant. In this and
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
130
UR:WPLS, vol. s2000, no. 1
subsequent tableaux, I will present di-syllabic foot forms with monomoraic stressed vowels, unless otherwise noted. Tableau (14): Moderate Neutralization Via [a]-reduction—Max[-high] » Dep[+low] /domá/ ‘houses’ (damá) (dumá) (dimá)q (domá) (demá)q (dmá)
)
/sadú/ ‘garden’ (loc.) (sadú) (sudú) (sidú)q (sodú) (sedú)q (sdú)
)
LIC NONPERIPH/ STRESS
MAX[-HI]
DEP +LO *
*! *! *! *! *! LIC NONPERIPH/ STRESS
* MAX[-HI]
DEP +LO
*! *! *! *! *!
*
Here, the last three candidate forms in each tableau are ruled out by LicNonperiph/Stress: they all contain a nonperipheral vowel that is not stressed. The second two candidates are both ruled out for deleting an underlying [-high] specification, in violation of Max[-high]. The winner violates only the lowranked constraint Dep[+low] (and only in the first tableau), since a [+low] specification has been inserted which was not present underlyingly. It should also be pointed out at this time that Max[-high] must also dominate faithfulness constraints for [front] and [round]—if Max[+front] or Max[round] were ranked above Max[-high], they could force reduction via raising in order to preserve the palatality or rounding of the underlying vowel. Since this is not the case in the pattern under consideration, it must be the case that Max[-high] » Max[+front], Max[round]. Now let’s look at [a]-reduction after a palatalized consonant. Recall that [a]-reduction is not affected by this environment—unstressed nonperipheral
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
131
vowels in the immediately-pretonic syllable reduce via lowering to [a] regardless of the quality of the preceding consonant. This results from the ranking Max[high] » C-/[+front], as shown in the following tableau for unstressed /o/ preceded by a palatalized consonant: (15) Moderate Neutralization of /o/ via a-reduction: After a Palatalized Consonant /p-okuq/ ‘I bake’ (p-akú) (p-ukú) (p-ikú)q (p-okú) (p-ekú)q (p-kú)
)
LIC NONPERIPH/ STRESS
MAX[-HI]
*! *! *! *! *!
C-/[+FT]
DEP +LO
* *
*
* *
*
The same result is also generated for unstressed /e/ and /a/ preceded by a palatalized consonant: (16) Moderate Neutralization of /a/ or /e/ via [a]-reduction: After Palatalized /r-eká/ ‘river’ (r-aká) (r-uká) (r-iká)q (r-oká) (r-eká) (r-ká)
)
LIC NONPERIPH/ STRESS
MAX[-HI]
*! *! *! *! *!
C-/[+FT]
DEP +LO
* *
*
* *
*
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
132
UR:WPLS, vol. s2000, no. 1
/p-at-í/ ‘five’ (gen.) (p-at-í) (p-ut-í) (p-it-í)q (p-ot-í) (p-et-í)q (p-t-í)
)
LIC NONPERIPH/ STRESS
MAX[-HI]
C-/[+FT]
*! *! *! *! *!
DEP +LO
* * *
*
*
3.1.2. Combining Moderate and Extreme Reductions Now that we have examined both the moderate and extreme vowel neutralizations in isolation, let’s take a look at them in combination to ensure that the constraints and rankings discussed separately do not produce the incorrect results when combined. In particular, the C-/[+front] constraint must be ranked in a manner such that it will motivate reduction to [i] in cases of extreme reduction, but not in cases of moderate reduction. The ranking Max[-high] » C-/[+front] suggested above has the appropriate effect. Recall from tableau (8) that the ranking of C-/[+front] and Max[-high] cannot be determined based only on the evidence provided from extreme reduction. In other words, the correct extreme reduction patterns result from either ranking. In the [a]-reduction dialects, the ranking must be Max[-high] » C-/[+front], since this ranking avoids reduction to [i] in contexts of moderate reduction. To see how this works, compare the following two tableaux illustrating reduction of unstressed /o/ in both extreme and moderate reduction cases: (17) Moderate Reduction of /o/ After Palatalized: Full Constraint Set *NONMORAIC/ LIC-NONPERIPH/ MAX /t-opló/ -high STRESS [+FT] ‘warmly’ (t-apló) (t-ipló) (t-upló) *! (t-epló) *! (t-opló) *! (t-pló)
)
MAX [-HI]
C-/ [+FT] *
*! *!
*
*
* *
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
DEP [+HI] * *
Crosswhite—Vowel Reduction in Russian
(18) Extreme Reduction of /o/ after Palatalized: Full Constraint Set Note: Only violations for the first unstressed vowel are shown /t-oploxód/ *NONMAX MAX LIC-NONPERIPH/ C-/ ‘motorized ship’ STRESS +FT [-HI] [+FT] MORAIC/ -high * t-i(plaxót) * *! t-u(plaxót) *! * * t-(plaxót) *! * t-a(plaxót) *! t-e(plaxót) *! * t-o(plaxót)
133
DEP [+HI]
)
* *
Note that in the first tableau, an [a]-reduced form is the optimal candidate, due in part to the fact that it satisfies the Max[-high] constraint. This ranking allows the [a]-reduced form to win despite the fact that it violates the C-/[+front] constraint. However, in the second tableau the [a]-reduced form is cast out of the running at an early stage by the *Nonmoraic/-high constraint. In other words, although [a]reduction is optimal in this dialect with respect to the ranking Max[-high] » C-/[+front], it produces a vowel that is too sonorous to be used under extreme reduction due to the higher-ranking constraint *Nonmoraic/-high. Therefore, in [a]-reduction dialects the C-/[+front] constraint is able to play a decisive role under extreme reduction, but not under moderate reduction. The remainder of this sections will demonstrate how additional moderate neutralization patterns can be derived. 3.1.3. Dialects with [i]-reduction A number of Central Russian dialects, including CSR, show a pattern referred to as [i]-reduction. This pattern is similar to [a]-reduction (discussed above) in that /a,o/ neutralize to highly sonorous [a] in this the immediately pretonic syllable, In [i]-reduction, this pattern is blocked when preceded by a palatalized consonant. In that case, /o,a/ reduce to [i] in the immediately pretonic, just as in the non-immediately-pretonic syllables. Underlying /e/ reduces to [i] regardless of the preceding consonant. The [i]-reduction pattern is seen in many central Russian dialects, including CSR, from which the example forms listed below are taken.
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
134
UR:WPLS, vol. s2000, no. 1
(19) Moderate neutralization via [i]-reduction Immediately Pretonic After Non-Palatalized
Immediately Pretonic After Palatalized
i
u
i
u
e
o
e
o
a
a
After Non-Palatalized [itá6] ‘floor, story’ /etá=/ cf. variant [etá6] /davát-/ [davát-] ‘to give’ (iter.) cf. [dát-] ‘to give’ /kotá/ [katá] ‘cat’ (gen. sg.) cf. [kót] ‘cat’ (nom. sg.)
/r-eká/ /p-at-í/ /t-opló/
After Palatalized [r-iká] ‘river’ cf. [r-é te6-ka] ‘little river’ [p-ití] ‘five’ (gen. sg.) cf. [p-át-] ‘five’ (nom. sg.) [t-ipló] ‘warmly’ cf. [t-óplij] ‘warm’
In this type pattern, reduction of /o,a/ to [i] after a palatalized consonant occurs in both moderate and extreme reduction contexts. This pattern can be accounted for by ranking Max[-high] below C-/[+front]. (This is the opposite of the [a]reduction pattern described above.) This ranking will allow C-/[+front] to motivate reduction to [i] in all unstressed syllables: (20) Moderate Neutralization via [i]-reduction: C-/[+front] » Max[-high] /n-oslí/ ‘they carried’
)
(n-islí) (n-aslí) (n-uslí) (n-slí) (n-eslí) (n-oslí)
NON-
LIC
MORAIC-i
NONPERIPH/ STRESS
MAX +FT
C-/ [+FT]
MAX –HIGH
DEP +LOW
* *! *! *! *! *!
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
* * *
Crosswhite—Vowel Reduction in Russian
135
Since Max[-high] still dominates Dep[+low], reduction via lowering is still produced when not preceded by a palatalized consonant, as shown below: (21) Moderate Reduction in [i]-reduction dialect after Nonpalatalized /domá/ ‘homes’
*NONMORAIC/ -high
) (damá)
MAX +FT
LIC NONPERIPH/ STRESS
C-/ [+FT]
MAX –HIGH
DEP +LOW *
(dimá) (dumá) (dmá) (demá) (domá)
*! *! *! *! *!
3.1.4. Dialects with [e]-reduction The diachronic predecessor of [i]-reduction is a type of reduction pattern where /e/ does not undergo reduction, and where unstressed /o,a/ after palatalized consonants also surface as [e]. This pattern, “[e]-reduction”, still exists in many dialects in the Moscow region, and was characteristic of Moscow pronunciation at one time. (22) Moderate Neutralization via [e]-reduction Immediately Pretonic After Non-Palatalized
Immediately Pretonic After Palatalized
i
u
i
u
(e)
o
e
o
a /r-eká/ /p-atí/ /n-osú/
[r-eká] ‘river’ [p-etí] ‘five’ (gen. sg.) [n-esú] ‘I carry’
a cf. [r-ét6k] ‘little river’ cf. [p-át-] ‘five’ (nom. sg.) cf. [n-ós] ‘he carried’
Based on the available descriptions of [e]-reduction, there seem to be at least three variants of this pattern. The variant described above seems to be the one
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
136
UR:WPLS, vol. s2000, no. 1
most commonly described in dialectological handbooks (cf. Avanesov and Orlova (1964), Kuznetsov (1973) and Kasatkin (1989)): reduction to [e] only in the immediately pretonic syllables, with reduction to [i] elsewhere. First the analysis for this variant will be presented. Discussion of the two other [e]-reduction patterns will follow. The [e]-reduction pattern differs from the [i]-reduction analysis given above in that the Lic-Nonperiph/Stress constraint is more lowly ranked—the constraints Max[+front], C-/[+front], and Max[-high] have been promoted to a position above Lic-Nonperiph/Stress, but below *Nonmoraic/-high. As shown below, this blocks raising: (23) Moderate Reduction of /e/ via [e]-reduction /r-eká/ ‘river’
) (r eká)
*NONMORAIC/high
MAX +FRONT
C-/ +FT
MAX –HIGH
DEP +LOW
*
-
(r-iká) (r-oká) (r-aká) (r-ká) (r-uká)
LIC NONPERIPH/ STRESS
*! *! *! *! *!
* * * *
* * * *
Here, the optimal candidate retains the underlying mid vowel in an unstressed syllable, despite the fact that this violates Lic-Nonperiph/Stress. To do otherwise would mean either raising to [i], which violates the more highly-ranked constraint Max[-high]; lowering to [a], or centralizing to [], both of which violate Max[+front]. The same constraints and rankings also correctly predict that unstressed /o,a/ will reduce to [e] when preceded by a palatalized consonant. This is illustrated below for underlying /o/:
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
137
(24) Moderate Reduction of /o/ via [e]-reduction (after palatalized) /t-opló/ ‘warmly’ (t-epló) (t-ipló) (t-opló) (t-apló) (t-pló) (t-upló)
)
NONMORAIC/i
MAX +FT
C-/ +FT
MAX –HIGH
LIC NONPERIPH/ STRESS
DEP +LOW
* *! *! *! *! *!
* * * *
Note that this re-ranking does not mean that unstressed /o/ can remain unreduced: presence of Lic-Nonperiph/Stress, even in a lowly-ranked position, will still motivate reduction, as shown below: (25) Moderate Reduction of /o/ in [e]-reduction dialect (after nonpalatalized) /domá/ ‘homes’
) (damá)
(demá) (domá) (dmá) (dimá) (dumá)
NONMORAIC/i
MAX +FT
C-/ +FT
MAX –HIGH
LIC NONPERIPH/ STRESS
DEP +LOW *
*! *! *! *! *!
* *
Here, the constraints Max[+front] and C-/[+front] have no effect—in the analysis of [i]-reduction, they helped to motivate raising. In the case of underlying /o,a/ not preceded by a palatalized consonant, there is nothing that would motivate reduction via raising—instead, the default pattern of reduction via lowering is maintained. This analysis of [e]-reduction can be summarized by saying that higher rank for Max[+front] and C-/[+front] block lowering after C-, while higher rank for Max[-high] simultaneously blocks raising to [i].
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
138
UR:WPLS, vol. s2000, no. 1
Finally, by keeping the constraint Nonmoraic/-high in an undominated position, the [e]-reduction pattern is limited to occurring in the syllable immediately preceding the stress. This is demonstrated in the following tableau: (26) Extreme Reduction of /e/ in an [e]-reduction dialect /r-ete6-ovój/ ‘vocal’ r-i(te6-evój) r-(te6-evój) r-u(te6-evój) r-e(te6-evój) r-a(te6-evój) r-o(te6-evój)
)
NONMORAIC/i
MAX +FRONT *! *!
*! *! *!
C-/ +FT
MAX –HIGH
* *
* * *
LIC NONPERIPH/ STRESS
DEP +LOW
* * *
* *
* *!
(27) Extreme Reduction of /o/ in an [e]-reduction dialect /t-oplovátoj/ ‘warmish’ t-i(plavá)tij t-u(plavá)tij t-(plavá)tij t-e(plavá)tij t-o(plavá)tij t-a(plavá)tij
)
NONMORAIC/i
*! *! *!
MAX +FRONT
C-/ +FT
MAX –HIGH
*! *!
* * *
LIC NONPERIPH/ STRESS
* *
DEP +LOW
* * *
However, Avanesov (1984) describes a different [e]-reduction pattern as being characteristic of some speakers of the pre-Revolutionary Moscow pronunciation norm ("Old Muscovite"). In this variant, reduction to [e] occurs in all unstressed syllables irregardless of their position with respect to the stressed syllable. This pattern results from promoting Max[-high] not only above one reduction constraint (as illustrated in tableaux (26) and (27) above), but above both reduction constraints. This is demonstrated in the following tableaux:
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
139
(28) Avanesov’s variant of [e]-reduction; underlying /e/ /r-ete6-ovój/ ‘vocal’ r-e(te6-evój) r-a(te6-evój) r-o(te6-evój) r-i(te6-evój) r-(te6-evój) r-u(te6-evój)
MAX –HIGH
NONMORAIC/i
* * * *! *! *!
MAX +FRONT
C-/ +FT
LIC NONPERIPH/ STRESS
DEP +LOW
* *! *!
* *
*! *!
* *
* *!
(29) Avanesov’s variant of [e]-reduction; underlying /o/ /t-oplovátoj/ ‘warmish’ t-e(plavá)tj t-o(plavá)tj t-a(plavá)tj t-i(plavá)tj t-u(plavá)tj t-(plavá)tj
)
MAX –HIGH
NONMORAIC/i
MAX +FRONT
C-/ +FT
* * *
*! *!
* *
*! *! *!
LIC NONPERIPH/ STRESS
DEP +LOW
* *
*! *!
Finally, in certain dialects of Belarusian described by Lamprecht (1987), reduction to [e] only occurs in the elsewhere environment: in the syllable immediately preceding the stress, [a]-reduction occurs while the remaining unstressed syllables show [e]-reduction. This pattern can be generated by maintaining the higher rank for Max[-high] used in Avanesov’s [e]-reduction variant (cf. (26) abd (27) above), but reversing the ranking of Lic-Nonperiph and C-/+front used in the basic [e]-reduction pattern (cf. (28) and (29) above). 3.1.5. Attenuated [a]-reduction In a number of dialects in the South Russian dialect area, reduction to [i] also occurs in the immediately pretonic syllable, similar to the [i]-reduction pattern described immediately above, but only when the unstressed vowel in question is flanked by palatalized consonants on both sides. In unfooted unstressed positions, reduction to [i] occurs after palatalized consonants, as normal (no double-sided environment is necessary). This pattern is traditionally
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
*
140
UR:WPLS, vol. s2000, no. 1
referred to as “attenuated [a]-reduction”, since the tendency to reduce to [a] is attenuated, not occurring in the double-sided palatalization environment. This pattern occurs in a number of central and southern dialects in the regions of Moscow, Kalinin, and Tula. (30) Moderate Neutralization via attenuated [a]-reduction Immediately Pretonic After Non-Palatalized
Immediately Pretonic After Palatalized
Immediately Pretonic Between Palatalized
i
u
i
u
i
u
o
e
o
e
o
a
/r-eká/ /p-ata/ /t-opló/
a
a
After Palatalized Between Palatalized [r-aká] ‘river’ /r-éte6-nój/ [r-ite6-nój] ‘river’ (adj.) cf. [r-é te6-ka] ‘little river’ cf. [r-é te6-ka] ‘little river’ [p-atá] ‘heel’ (sg.) /p-at-í/ [p-ití] ‘five’ (gen. sg.) cf. [p-áti] ‘heels’ cf. [p-át-] ‘five’ (nom. sg.) [t-apló] ‘warmly’ /t-olétes/ [t-ilétes] ‘calf’ cf. [t-óplij] ‘warm’ cf. [t-ólk] ‘heifer’
This pattern can be derived by adding a double-sided phonotactic constraint such as the following: C-_C-: A vowel may not occur between two palatalized consonants unless it is [+front]. By sandwiching the Max[-high] constraint between the double-sided palatalization constraint and the single-sided palatalization constraint, the correct results will be produced: the high-ranking C-_C- constraint will force reduction to [i] in all appropriate contexts, including footed positions, due to the ranking C -_C» Max[-high]. The low-ranking constraint C-/[+front] will also motivate reduction to [i], but will be blocked in nonmoraic unstressed syllables by the ranking Max[-high] » C-/[+front].
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
141
3.1.6. Incomplete okan’e The last dialect type to be considered in this chapter is traditionally referred to as “incomplete okan’e”. In this type of dialect, the mid vowels do not reduce at all in the immediately pretonic syllable—unstressed /o/ remains [o], and unstressed /e/ remains [e]. However, in other unstressed syllables, extreme reduction occurs as expected. This type of pattern is characteristic of the Vladimir-Volga Basin dialect group, which is found in an area that is transitional between the vowel-reducing Central dialects and the non-reducing Northern dialects. According to Vysotskii’s (1973) phonetic survey of dialect vowel duration, the Vladimir-Volga Basin group is similar to the central Russian dialects in terms of the duration of the immediately-pretonic vowel, and is usually considered a member of the Central dialect group. In Kasatkina’s (1996) terminology, the Vladimir-Volga Basin group displays the “strong nucleus and weak periphery” pattern. This contrasts with the vowel duration results that Vysotskii (1973) reports for the northern neighbors of the Vladimir-Volga Basin group, which would be more similar to the “wave” rhythmic pattern. The Vladimir-Volga Basin “incomplete okan’e” pattern is easily assimilated into the analysis already provided by demoting the LicNonperiph/Stress constraint below Max[-low], thus making it preferable to deploy mid vowels in unstressed syllables rather than incur a faithfulness violation. However, this does not prevent *Nonmoraic/-high from producing vowel reduction in the remaining unstressed, nonmoraic syllables. In the nonmoraic syllables, unstressed /o/ and /e/ are not allowed to surface—but this is due to their relatively sonorous status, not to their nonperipheral nature. Of course, if the remaining unstressed vowels were to regain their moraic status (due to low rank of *Struc-µ), there would be no chance for vowel reduction at all under this grammar. This is a desirable effect in that it allows us to account for those Northern dialects that lack vowel reduction entirely. As described by Vysotskii (1973), the non-reducing dialects do not make sharp durational distinctions among different types of unstressed syllable—it would therefore be reasonable to assume that the Northern dialects lacking reduction altogether have a much lower ranking for the constraint *Struc-µ, which prevents the occurrence of both nonmoraic vowels and (resultantly) extreme reduction in these dialects. 3.1.7. Additional Dissimilative Reduction Patterns As demonstrated in the preceding section, the two-constraint approach to Russian vowel reduction is capable of accounting for a number of dialectal neutralization patterns, including reduction to [a], [e], [i], and []. Furthermore,
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
142
UR:WPLS, vol. s2000, no. 1
by having two separate reduction constraints, it is possible to block one or the other constraint, while allowing the remaining one to remain unfettered. By linking the distribution of extreme and moderate reduction patterns with foot form and moraic distribution, the same analysis was easily extended to include the Dissimilative vowel reduction patterns. However, only the three most basic Dissimilative variants were discussed. Extension of this analysis to additional, more complex Dissimilative variants provides strong additional support for this analysis. Namely, be exploring the interaction between vowel reduction constraints and foot-form constraints, the role of prominence constraints in Russian vowel reduction becomes even more evident. In this section, I present additional Dissimilative variants and their analyses that highlight this role. Let me start off by pointing out that the preceding sections have proceeded on the assumption that the constraints used in accounting for foot form do not interact with the constraints on vowel reduction. For example, interactions between, say, *µµ/X and C-/[+front] were considered irrelevant. In point of fact, however, this is not the case. Recall that in the [a]-reduction pattern, the immediately pretonic vowel will reduce to [a] even after a palatalized consonant. This means that the constraints Dep[+high] and Max[-high] must outrank the constraint C-__/[+front] in dialects with this pattern. In other words, in the [a]reduction pattern, it is more important to avoid raising than it is to follow the C-__/[+front] constraint. However, even in this low-ranked position, it is possible for C-__/[+front] to motivate reduction to [i] in the immediately pretonic syllable—namely, by forcing the word to have a monosyllabic foot, thus exposing the immediately pretonic syllable to extreme reduction. In order to prevent this state of affairs, it must be the case that all of the *µµ/X constraints outrank the C-__/[+front]. That is, if *µµ/a outranks C-__/[+front], it will not be possible for the C-__[+front] to force a monosyllabic footing since doing so would violate the more highly-ranked *µµ/X constraints that limit lengthening. The opposite ranking would allow lengthening of (some subset of) stressed vowels just in case the immediately pretonic vowel is preceded by a palatalized consonant. This is an attested pattern: all three of the basic Dissimilative variants do occur in certain dialects only in contexts after palatalized consonants (a non-dissimilative pattern is observed after non-palatalized consonants). The opposite pattern with dissimilation only after plain consonants and non-dissimilative reduction after palatalized consonants is unattested (Kuznetsov 1973). Dissimilative reduction limited to contexts after palatalized consonants can be summarized as follows:
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
143
(31) Dissimilative variants Limited to Contexts After CPattern No dissimilation at all Zhizdra after C- only Obojan after C- only Don after C- only
Ranking *µµ/i,u » *µµ/e,o » *µµ/(,o » *µµ/a » C-__/[+front], WSP *µµ/i,u » *µµ/e,o » *µµ/( » C-__/[+front] » *µµ/a » WSP *µµ/i,u » *µµ/e,o » C-__/[+front] » *µµ/(,o » *µµ/a » WSP *µµ/i,u » C-__/[+front] »*µµ/e,o » *µµ/(,o » *µµ/a » WSP
As shown in the table above, if the constraint C-__/[+front] is ranked immediately above *µµ/(,o, then the vowels /(,o,a/ will all lengthen under stress just in case the immediately pretonic vowel is preceded by a palatalized consonant. This will force the immediately pretonic vowel in such contexts to be unfooted and nonmoraic, and thus subject to extreme reduction, resulting in reduction of the pretonic vowel to [i] and satisfying C-__/[+front]. As mentioned in the preceding paragraph, these patterns are attested. By introducing the doubly-sided constraint C-__C-/[+front], this typology can be extended even further. If C-__C-/[+front] is ranked immediately above C-__/[+front] with no constraints intervening in the hierarchies illustrated in (31), the same output patterns will result. However, if C-__C-/[+front] is ranked above the entire *µµ/X constraint family, the result will be dissimlative reduction only in the non-flanked C-VC environment. In the flanked environment, C-__C-/[+front] will force a monosyllabic footing in all cases, resulting in pretonic reduction to [i]. This type of pattern is attested, and referred to as Dissimilative/Attenuated vowel reduction.7 This type of pattern is summarized in 7
Traditional Russian dialectological pattern names of the type X/Y are easily distinguished from similar names of the form Y/X using the following mnemonic: The pattern name given first is the pattern observed in the context CVC- (nonflanked), while the second pattern name given is the pattern observed in the context C-VC- (flanked). For example, Dissimilative/Attenuated patterns follow a dissimilative pattern in CVC-, and follow attenuated [a]-reduction in C-VC-. Likewise, Attenuated/Dissimilative variants follow attenuated [a]-reduction in CVC-, and follow a dissimilative pattern in C-VC-.
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
144
UR:WPLS, vol. s2000, no. 1
(33) below. These rankings generating each pattern are provided in (33). As illustrated, only two of the three possible Dissimilative/Attenuated patterns predicted by this typology are attested. (32) Dissimilative/Attenuated Reduction Patterns Dissimilative/Attenuated I (most common)
Æ conditions reduction to [i]Æ condition reduction to [a]
C- V C __q i u o o a
C- V C- __q all stressed V’s condition reduces to [i]
C- V C __q i u o o a
C- V C- __q all stressed V’s condition reduces to [i]
Dissimilative/Attenuated II
Æ condition reduction to [i]Æ
condition reduction to [a]
As shown in the illustrations, the Dissimilative/Attenuated patterns are basically Dissimilative reduction patterns that are interrupted or blocked in contexts flanked by palatalized consonants. In contexts flanked by palatalized consonants, [i]reduction is the observed reduction pattern. Dissimilative/Attenuated I is an interrupted version of the Zhizdra Dissimilative pattern—the Zhizdra reduction pattern occurs except in contexts flanked by palatalized consonants. Similarly, Dissimilative/Attenuated II is an interrupted version of the Obojan Dissimilative pattern.
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
145
(33) Rankings for Dissimilative/Attenuated Reduction Patterns Pattern No dissimilation at all (attenuated [a]reduction) Dissimilative/Attenuated I (cf. Zhizdra) Dissimilative/Attenuated II (cf. Obojan) Unattested Dissimilative/Attenuated pattern (cf. Don)
Ranking C-__C-/[+front] » *µµ/i,u » *µµ/e,o » *µµ/(,o » *µµ/a » C-__/[+front], WSP C-__C-/[+front] » *µµ/i,u » *µµ/e,o » *µµ/( » C-__/[+front] » *µµ/a » WSP C-__C-/[+front] » *µµ/i,u » *µµ/e,o » C-__/[+front] » *µµ/(,o » *µµ/a » WSP C-__C-/[+front] » *µµ/i,u » C-__/[+front] »*µµ/e,o » *µµ/(,o » *µµ/a » WSP
If, on the other hand, C-__C-/[+front] is the constraint that is interleaved among the members of *µµ/X, and C-__/[+front] is left at the bottom of the *µµ/X constraint family, we will find [a]-reduction in the non-flanked environment C-VC (cf. first ranking in (31)), while simultaneously finding a Dissimilative reduction pattern in the flanked environment C-__C-. This type of pattern is attested, and is referred to as Attenuated/Dissimilative reduction (see fn. 7). Attested Attenuated/Dissimilative variants are summarized in (34). The rankings generating these patterns are illustrated in (35).8 (34) Attenuated/Dissimilative Reduction Patterns Zhizdra-II: C- V C __q all stressed vowels condition reduction to [a]
C- V C- __q i u e ( o a
Å condition reduction to [a] Å conditions reduction to [i]
8
The Kidusov and Novoselkov Attenuated/Dissimilative patterns should not be confused with the similar-sounding but formally distinct Kidusov and Novoselkov Assimilative-Dissimilative reduction patterns.
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
146
UR:WPLS, vol. s2000, no. 1
Kidusov: (see fn. 8) C- V C __q all stressed vowels condition reduction to [a]
C- V C- __q i u e ( o a
Novoselkov (most common; see fn. 8): C- V C __q C- V C- __q all stressed i u vowels e condition ( o reduction to [a] a
Å condition reduction to [a] Å condition reduction to [i] Å condition reduction to [a] Å condition reduction to [i]
As shown above, the Novoselkov pattern is a combination of attenuated [a]reduction and the Obojan Dissimilative pattern. The Kidusov pattern combines attenuated [a]-reduction with the Don Dissimilative pattern, and the Zhizdra-II pattern combines attenuated [a]-reduction with the Zhizdra Dissimilative pattern. As noted, the Novoselkov pattern is the most common type of Attenuated/Dissimilative reduction. (35) Rankings for Attenuated/Dissimilative Reduction Patterns Pattern No dissimilation at all ([a]-reduction) Zhizdra-II Kidusov (cf. Obojan) Novselkov (cf. Don)
Ranking *µµ/i,u » *µµ/e,o » *µµ/(,o » *µµ/a » C-__C-/[+front], C-__/[+front], WSP *µµ/i,u » *µµ/e,o » *µµ/( » C-__C-/[+front] » *µµ/a » C-__/[+front], WSP *µµ/i,u » *µµ/e,o » C-__C-/[+front] » *µµ/(,o » *µµ/a » C-__/[+front], WSP *µµ/i,u » C-__C-/[+front] »*µµ/e,o » *µµ/(,o » *µµ/a » C-__/[+front], WSP
Finally, if both C-__C-/[+front] and C-__/[+front] are interleaved with the *µµ/X constraint family at different places, compound Dissimilative systems will be the result. In compound dissimilation, one Dissimilative pattern occurs in the context C-__, and a second Dissimilative pattern occurs in the context C-__ C-. Importantly, this analysis predicts that you cannot freely combine Dissimilative
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
147
variants to produce a compound system. Rather, the analysis predicts that the Dissimilative pattern used in the context C-__ C- should always favor reduction to [i] under more conditions than the pattern found in the same dialect in the context C-__. Compound Dissimilative systems are in fact attested, and they do obey this generalization. The attested compound Dissimilative variants are illustrated below in (36), and the rankings that generate them are given in (37). (36) Compound Dissimilative Systems: Sudzha: C- V C __q i u o o a
C- V C- __q i u e ( o a
Shchigri: C- V C __q i u o o a
C- V C- __q i u e ( o a
Dmitrov: C- V C __q i u o o a
C- V C- __q i u e ( o a
Key = indicates stressed vowels that condition reduction to [a] in the immediately pretonic syllable = indicates stressed vowels that condition reduction to [i] in the immediately pretonic syllable
= indicates that a vowel of the particular quality in question never occurs in the context shown (i.e., [e,(] do not occur after plain consonants, [o] does not occur after C-)
As illustrated, the Sudzha pattern combines Zhizdra and Don Dissimilative variants, the Shchigri pattern combines Zhizdra and Obojan, and the Dmitrov pattern combines Obojan and Don. As shown in these illustrations, the compound Dissimilative variants share a common characteristic: The reduction pattern observed in contexts flanked by palatalized consonants are always ones that produce reduction to [i] more often than the patterns they are paired with. For example, in the Sudzha compound pattern, reduction to [i] is observed between
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
148
UR:WPLS, vol. s2000, no. 1
palatalized consonants except when the stressed vowels are high ([í,ú]), whereas reduction to [i] is observed in the non-flanked environments only in the context of a stressed low vowel ([á]). In other words, the compound patterns are similar to the interrupted dissimilation patterns discussed above—recall that interrupted Dissimilative variants are characterized by blockage of a Dissimilative pattern in order to have [i]-reduction in contexts flanked by palatalized consonants. The compound Dissimilative variants are similar: a Dissimilative pattern is blocked in contexts flanked by palatalized consonants in order to have a second Dissimilative pattern that favors reduction to [i] in a larger number of contexts. (37) Rankings for The Three Attested Compound Dissimilative variants Pattern Sudzha (cf. Zhizdra + Don) Shchigri (cf. Zhizdra + Obojan) Dmitrov (cf. Obojan + Don)
Ranking *µµ/i,u » C-__C-/[+front] » *µµ/e,o » *µµ/(,o » C-__/[+front] » *µµ/a » WSP *µµ/i,u » *µµ/e,o » C-__C-/[+front] » *µµ/(,o » C-__/[+front] » *µµ/a » WSP *µµ/i,u » C-__C-/[+front] » *µµ/e,o » C-__/[+front] » *µµ/(,o » *µµ/a » WSP
Of course, there are many more logical rankings available similar to those shown in (37), but which are unattested. Furthermore, the ranking of WSP could also be varied in any of the hierarchies discussed in this section, correctly predicting the existence of dialects that use one of the three basic Dissimilative variants (Don, Obojan, Zhizdra) after non-palatalized consonants, but elsewhere use some other pattern, including Don, Obojan, Zhizdra, Zhizdra-II, Kidusov, Novoselkov, Sudzha, Shchigri, Dmitrov, Dissimilative/Attenuated I, or Dissimilative/Attenuated II (Stroganova and Bromlei 1986, maps #1 and #7). 3.1.8. An Alternative Analyis for Dissimilative variants As noted in the introduction, previous analyses of Dissimilative Russian vowel reduction have relied on the idea that the quality of the stressed vowel directly influences the surface quality of the preceding unstressed vowel (Halle (1965), Nelson (1974), Davis (1970), Suzuki (1998)), using rules or constraints on dissimilation. The analysis presented here demonstrates that it is possible to sidestep the question of dissimilation by mediating the stressed vowel’s influence on the surface quality of the preceding vowel via word prosody. This approach crucially relies on the idea that the more complex Dissimilative variants discussed in the immediately preceding section are properly described as the intersection of
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
149
simple dissimlative patterns (Zhizdra, Obojan, and Don) with the effects of both single-sided and double-sided palatalization constraints, as already discussed. However, an alternative viewpoint has been suggested by Davis (1970) and Suzuki (1998) in which the direct influence of stressed vowel quality cannot be obviated. Confusion over the exact nature of the more complex Dissimilative variants results from the fact that the vowels /e,(/ do not occur after nonpalatalized consonants, due to accidents of the historical development of the Russian vowel system. Similarly, the vowel /o/ cannot occur after palatalized consonants. In other words, if the immediately pretonic vowel is in the singlyflanked environment C-__C, the stressed vowel cannot be [é] or [(#]; if the imediately pretonic vowel is in the doubly-flanked environment C-__C-, the stressed vowel cannot be [ó]. In other words, the following sequences do not occur: *C-__C-ó, *C-__Cé, *C-__C(#. These distributional facts have been indicated in the illustrations provided in the preceding sections by presenting separate vowel inventories for the environments C-__C and C-__C-, and by blacking out the cells corresponding to the unattested sequences. It is important to note at this point that these distributional accidents cloud the proper characterization of certain dialectal reduction patters by providing an (accidental) link between stressed vowel quality and consonant palatalization: the front mid vowels are linked specifically with the occurrence of palatalized consonants, while one of the back mid vowels is linked specifically with the absence of paltalized consonants. Only one of the mid vowels, /o/ can occur after either palatalized or non-palatalized consonants. This means that the description provided earlier for, say, the Dmitrov compound reduction pattern could be stated more concisely, without reference to consonantal environment, as indicated below: (38) The Dmitrov Reduction Pattern: Expanded and Concise Descriptions Expanded: C- V C __q C- V C- __q i u i u o e o ( o a a
= this vowel conditions reduction to [a] = this vowel conditions reduction to [i] = this vowel does not occur in that context
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
150
UR:WPLS, vol. s2000, no. 1
Concise: i u e o ( o a
= this vowel conditions reduction to [a] = this vowel conditions reduction to [i]
In the case of the Dmitrove pattern, the concise version that does not differentiate between C-__C and C-__C- does not lose any information since the stressed vowel [o#] is the only mid vowel that can occur in both environments, and in this pattern [o#] happens to have a uniform behavior in both environments. However, the same cannot be said for the Sudzha and Shchigri patterns, where the stressed vowel [o#] does not have a uniform behavior. The full description of these patterns as provided earlier is repeated below: (39) Sudzha and Shchigri Patterns, Expanded Version (Correct) Sudzha: C- V C __q i u o o a
C- V C- __q i u e ( o a
Shchigri: C- V C __q i u o o a
C- V C- __q i u e ( o a
Key = indicates stressed vowels that condition reduction to [a] in the immediately pretonic syllable = indicates stressed vowels that condition reduction to [i] in the immediately pretonic syllable = this vowel does not occur in that context
The analyses of the Sudzha and Shchigri patterns provided by Davis and Suzuki depend on a “collapsed” description that loses information. The patterns as described by these two authors are basically as follow:
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
151
(40) Sudzha and Shchigri Patterns, Collapsed Version (Incorrect) Sudzha (incorrect) i u e o ( o a
= this vowel conditions reduction to [a] = this vowel conditions reduction to [i]
Shchigri (incorrect) i u e o ( o a Comparing these two types of description shows that reliance on stressed vowel quality alone to condition the surface quality of the preceding vowel is inadequate: use of the collapsed descriptions loses track of the fact that the vowel [o#] does not have a uniform behavior. Using consonantal enviroment as the
conditioning factor avoids this undesirable consequence. Alternatively, one could also correctly describe the Sudzha and Schigri patterns by referring to the etymological origin of different vowel qualities, distinguishing the /o/ that derives from a back yer (*×) from the /o/ that derives from /(/ (historically, C-(#C > C-o#C, while C-(#C- remained C-(#C-). However, this is plainly impossible for a synchronic analysis. The lone piece of evidence for the alternative viewpoint expressed by Davis and Suzuki is the existence of an unsual reduction pattern, usually referred to by the name Mosal-. The Mosal- pattern is as follows: (41) The Mosal- Reduction Pattern C- V C __q i u o o a
C- V C- __q i u e ( o a
= this vowel conditions reduction to [a] = this vowel conditions reduction to [i] = this vowel does not occur in that context
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
152
UR:WPLS, vol. s2000, no. 1
As illustrated, the Mosal- pattern can be described as having the Obojan reduction pattern in singly-flanked palatalization environments, but the reduction pattern described for the doubly-flanked environments is not an attested form of Dissimilative reduction. This puts the Mosal- pattern in stark contrast with all the other Dissimilative variants discussed so far, which are either simple Dissimilative patterns, or combinations of patterns that are independently attested in isolation. Indeed, according to Avanesov and Bromlei (1986, p. 103), the Mosal- pattern is poorly attested, and is shown on the DARJa dialect atlas only in coocurrence with either the Sudzha or Zhizdra patterns. Based on the irregular nature of the Mosal- pattern, as well as its poor attestation in isolation, I suggest that the Mosal- pattern is either inappropriately described in the literature, or is not the result of purely phonological factors.9 Whatever the exact nature of the Mosal- pattern, it is clear that Davis’ and Suzuki’s collapsed description of the pattern is incorrect. They describe Mosal- essentially as follows: (42) Incorrect Description of the Mosal- Pattern i e (
u o o
= this vowel conditions reduction to [a] = this vowel conditions reduction to [i]
a The error in presenting the Mosal- pattern in this way can be ascertained by examining the behavior of stressed vowels found after consonant cluster. For example, Nelson (1974) considers vowel reduction data that are taken from the actual field notebooks of Russian dialectologists working on dialect atlases for the Russian Academy of Sciences. Discussing the notebook entry describing a dialect with Mosal- compound Dissimilative reduction, Nelson notes: In position [9] [i.e., in the context C-VC(½ -KC], [a] instead of the expected [i] was recorded several times in the fieldworker’s 9
One appealing reanalysis based on Ward’s (1984) hypothesis concerning the historical development of the Mosal- pattern is that the sequence identified by dialectologists as C-__C-o# (on largely etymological rather than phonetic grounds) is actually identified by native speakers as C-__C-ó—a phoneme shift that makes the phonotactic distribution of the phoneme /o/ more regular by eliminating the accidental gaps *C-ó discussed in the main text.
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
153
booklet Strikingly enough and probably of significance is the fact that in every case where [a] was recorded a consonant cluster preceded the stressed vowel, the second member of which was invariably a resonant. (Nelson 1974, p. 166). For example, the forms [t-amn-(½t-], [vt-apl-(½], [vv-adr-(½], and [sv-atl-(½t-] were recorded. Under the collapsed description of Mosal-, we would expect stressed [(½] to directly condition reduction to [i] in the preceding syllable, predicting the incorrect forms *[t-imn-(½t-], *[vt-ipl-(½], *[vv-idr-(½], and *[sv-itl-(½t-]. The important observation here is that the first member of these consonant clusters is non-palatalized. Given this observation, the expanded description that differentiates between C-__C and C-__C- makes the correct predictions—the immediately pretonic vowels in [t-amn-(½t-], [vt-apl-(½], [vv-adr-(½], and [sv-atl-(½t-] are in the environment C-VC (a non-flanked environment), and therefore reduce to [a]. The stressed vowel [(#] can condition reduction to [i] only when the preceding vowel is surrounded by palatalized consonants. In summary, an alternative analysis that depends on direct featural dissimilation to determine the surface quality of the preceding vowel is capable of making correct predictions only for the Dmitrov reduction pattern—incorrect results are obtained for the Sudzha, Shchigri, and Mosal- patterns. Although the exact analysis for the Mosal- pattern remains elusive due to the irregular reduction pattern observed when the immediately pretonic vowel is surrounded by palatalized consonants, the exact status of this pattern is questionable, and it is clear that even the Mosal- pattern relies on reference to consonantal environment. Finally, it should be emphasized that the very idea that Sudzha, Shchigri, Dmitrov, Mosal- (if it exists), and related complex Dissimilative variants rely on direct featural dissimilation rather than consonantal environment is in principle called into question by a single overarching consideration: all of the discussed complex Dissimilative variants only affect the reduction of unstressed vowels that follow palatalized consonants. Unstressed vowels that follow non-palatalized consonants have only four basic options: Obojan, Zhizdra, Don (a.k.a. Belgorod, cf. Ward (1984)), or non-dissimilative reduction.
4. Conclusions In the preceding sections, I have presented an analysis of the various twopattern vowel reduction systems that are attested in the southern and central
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
154
UR:WPLS, vol. s2000, no. 1
dialects of Russian. This analysis presents a non-dissimilatory explanation for the Don, Zhizdra, and Obojan reduction patterns that can be easily converted to account for a range of other attested Russian vowel reduction patterns. This analysis also demonstrates that the two different vowel reduction patterns found in these dialects (“extreme” and “moderate”) are in fact caused by two different types of phonetic motivation: the desire to avoid certain perceptually-challenging vowel qualities (in this case, mid vowels), and the desire to avoid sonorous vowels in non-prominent positions (in this case, nonmoraic positions), observations formalized over 70 years ago by Jakobson (1929). The orthogonal nature of these two trends is especially clear from the analysis of [e]reduction (section 3.1.4), where the two vowel reduction constraints—LicNonperiph/Stress and *Nonmoraic/-high—must have distinct rankings; and from the analysis of incomplete okan’e (section 3.1.6), where one of these vowel reduction processes is completely inactive (blocked). Furthermore, it should also be pointed out that the rich variety of two-pattern vowel reduction systems attested in Russian dialects all follow a single generalization: the extreme vowel reduction patterns differ from the moderate vowel reduction patterns in disallowing certain sonorous reduction vowels, such as [a] or [e]. This observation meshes well with the proposed analysis for these dialects. The analysis of extreme vowel neutralization as prominence reduction predicts that it will never be the case that extreme vowel neutralization will differ from moderate reduction in disallowing certain non-sonorous reduction vowels. This is especially clear when one compares the vowel sub-inventories that are most commonly observed in stressless position in Southern and Central Russian dialects: [i,u,a] in moraic unstressed syllables and [i,u,] in nonmoraic unstressed syllables. This is a telling fact since it is not the case that extreme reduction results in the preservation of fewer contrasts or the usage of a smaller subinventory. Instead, it seems to be the case that extreme reduction is a completely independent type of vowel reduction process. And finally, by mediating the effect of the stressed vowel quality in the Dissimilative dialects via foot form and the alignment of sonority and moraicity, a wide range of complex yet attested Dissimilative variants can also be accounted for without adding any additional formal machinery to the analysis. These results shed light on several issues of phonological theory. For example, this analysis suggests that bounded feet can and do occur in languages that do not possess the usual indicators of this phenomenon, such as fixed stress placement or occurrence of rhythmic secondary stress. The observed link across Russian dialects between foot form, vowel reduction, and phonetic phenomena such as non-phonemic vowel duration, devoicing, and deletion suggest that the
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
155
presence of bounded feet can be learned based on phonetic as well as phonological factors. 4.0.1. Beyond Russian: Evidence from Other Languages Now that we have a clear picture of how vowel reduction works in Russian, we can investigate some linkages between vowel reduction in other languages and vowel reduction in Russian. In particular, many of the formal mechanisms investigated above are also useful in accounting for reduction in other languages, such as Catalan. Similarly, evidence from other languages— such as European Portuguese—helps to provide additional support for some of the formal mechanisms used in the analysis for Russian, such as the use of the moraic vs. nonmoraic distinction for extreme reduction rather than the footed vs. unfooted distinction. 4.0.1.1. Catalan In standard Catalan (Mascaró 1978, Recasens 1991), unstressed syllables may not contain vowels other than [i,u,]. This contrasts with the situation found in stressed syllables, where the phonemic vowel inventory includes /i,u,e,o,(,o,a/. The neutralizations used to reduce the phonemic 7-vowel inventory to the 3vowel subinventory [i,u,] are depicted below, with examples from Mascaró (1978): (43) Vowel Neutralization in Catalan (data from Mascaró 1978): i e
(
u
Ç
o
o
a
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
156
UR:WPLS, vol. s2000, no. 1
V under stress ‘harbor’ po#rt gós ‘dog’ ‘light’ úm sák ‘sack’ ‘hair’ p(#l sérp ‘snake’ prím ‘thin’
Same V unstressed purtuári ‘related to harbor’ gusás ‘big dog’ ‘light’ (adj.) uminós ‘small sack’ sk(#t ‘hairy’ plút ‘big snake’ srpo#t ‘to make thin’ primá
This pattern of reduction is similar to that seen in Russian extreme (nonimmediately-pretonic) reduction: a vowel sub-inventory of [i,u,] is produced, using a reduction strategy that involves both raising and centralization. However, the Catalan reduction pattern differs in two important respects. First, the neutralizations utilized are the reverse of those seen in Russian: in Russian, the front mid vowel raises and the back mid vowel centralizes with /a/ to []; in Catalan, the front mid vowels centralize with /a/ to [], while the back mid vowels raise. Second, in Catalan, these neutralizations are not part of a two-pattern reduction system. That is, there is no “moderate” reduction in Catalan. Note, for example, that [] can occur in the syllable immediately preceding the stress (cf. [sk(#t], ‘little sack’). To account for the Catalan neutralization pattern, we can use basically the same reduction constraint that was seen in the Russian case. The only modification necessary concerns the conditioning environment: whereas in Russian, we used *Nonmoraic/-high, in Catalan we must use *Unstressed/-high. It should be pointed out, however, that if we make the simplifying assumption that Catalan unstressed vowels are nonmoraic, we could use exactly the same reduction constraint for both languages. However, in the absence of any additional data supporting this claim for Catalan, I will make the less controversial assumption that all Catalan vowels are moraic, and simply modify the reduction constraint accordingly. (This is possible in Catalan, but not in Russian, since Catalan does not have a two-pattern reduction system.) The vowel reduction constraint used in Catalan will therefore be: *Unstressed/-high: Unstressed syllables may not contain a vowel with a sonority greater than that of [i,u].
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
157
As mentioned above for Russian, the vowel [] is not more sonorous than [i,u], so the constraint given above will not be violated if [] occurs in an unstressed syllable. The other difference mentioned above—the different neutralization strategy seen in Catalan—is accomodated simply by changing the rankings of vocalic faithfulness constraints with respect to the reduction constraint. Recall that in the Russian case, Max[-high] was high-ranked, making lowering/centralization the preferred reduction strategy. This will remain the case in Catalan: if possible, vowels will reduce via centralization. In the Russian case, Max[+front] was also ranked high, causing the unstressed vowel /e/ to forego centralization in favor of raising, in order to preserve its underlying palatality. In Catalan, the situation is reversed: Max[+front] is ranked low, but Max[round] is ranked high. Thus, Catalan unstressed /o,o/ will forego reduction-viacentralization for raising, in order to maintain their underlying rounding. Example tableaux are provided below to show how this ranking works (the first tableau demonstrates reduction of a front mid vowel; the second tableau demonstrates reduction of a back mid vowel): /p(lút/ a. d. e. b. c. f. g. h.
) plút pilút pulút p(lút pelút polút polút palút
*UNSTRESSED/ -high
MAX[ROUND ]
MAX[-HI]
*! *! *! *! *! *! *!
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
158
UR:WPLS, vol. s2000, no. 1
/gosás/ a. g. d. e. f. b. c. h.
) gusás
gsás gisás gosás gosás gesás g(sás gasás
*UNSTRESSED/ -high
MAX[ROUND ]
MAX[-HI] *
*! *! *! *! *! *! *!
*
* * *
Vowel reduction is also blocked in a few contexts in Catalan. These are discussed in chapter 7. 4.0.1.2. European Portuguese In European Portuguese (Brakel 1985, Carvalho 1988-92), stressed syllables can contain the vowels /i,u,e,o,(,o,a/ and, with a limited distribution, /n/. In unstressed syllables, however, only [i,u,] and sometimes [n] can occur—the neutralizations that produce this subinventory are similar to those seen in Catalan: /e,(/ > [], /o,o/ > [u], /a/ > [] (or [n]). (44) Iberian Portuguese Vowel Reduction (Brakel 1985)
i > i (no change) u > u (no change) e> (> a> o>u o>u
Vowels Under Stress ‘I blink’ pí6ku púlu ‘I jump’ ‘fear’ mé'u ‘sins’ p(½k ‘picks up’ kát ‘I play’ to½ku ‘mouth’ bók
Same Vowels Unstressed ‘to blink’ pi6kár pulár ‘to jump’ ‘fearful’ m'rózu ‘to sin’ pkár ‘to pick up’ ktár tukár ‘to play’ ‘big mouth’ bukrn½ Z
Furthermore, vowel reduction in European Portuguese is also similar to that found in Catalan in that it is not part of a two-pattern reduction system. This being the case, we might be tempted to simply apply the same analysis sketched above for Catalan to European Portuguese. However, there is one important difference between the European Portuguese and Catalan vowel reduction systems that needs
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
159
to be addressed. Namely, vowel reduction in European Portuguese is blocked in unstressed syllables that end with a sonorant consonant. Examples of this sort of vowel-reduction blockage include the following forms (from de Carvalho 198892 and Brakel 1985).10 (45) Blockage of Vowel Reduction in European Portuguese syllable-final j syllable-final w syllable-final r syllable-final l syllable-final n
baj6ár kawzár súkar faltár konssn ½Z11
tejmár ewróp kdáver voltár sentár
It would be tempting to assume that blockage of vowel-reduction in these syllables is simply due to the adjacency of a sonorant consonant—we might assume, for example, that V + sonorant is a combination that is particularly easy to articulate or that it has some special perceptual advantage. However, it seems as though the sonorant consonants can only block vowel reduction when they are syllable-final. That is, intervocalic sonorant consonants do not block reduction of a preceding unstressed vowel—consider [dlatór]12 (*[delatór]), [kur(tívu] (*[kor(tívu]), where the initial unstressed vowel undergoes reduction despite the following non-tautosyllabic sonorant (for blockage of reduction on the unstressed vowels [a] and [(] see fn. 12). In fact, according to Brakel, a following nontautosyllabic sonorant increases the likelihood that an unstressed vowel will be deleted—for example, deletion of the unstressed [u] in [pulár] is more likely than deletion of the [u] in [tukár]. (He also states, however, that the unstressed [u] in [tukár] is more likely to be devoiced.) 10
The behavior of syllable-final [r] seems to be unstable—in Brakel (1985) it does not block reduction, but in de Carvalho (1992) it does. It is possible that syllable-final /r/ is sometimes pronounced as a non-sonorant, as in Brazilian Portuguese, where syllable-final /r/ is pronounced /x/. 11 The vowel /n/ occurs in Iberian, but not Brazilian, Portuguese. It is minimally contrastive with [a]—this contrast is mainly limited to verbal desinences. The vowel /n/ reduces in a manner identical to unstressed /a/. The blockage of vowel reduction seen with the unstressed [a] and [(] in these forms effects a number of derived forms, and is not associated with the preceding sonorant—see Brakel (1985) for discussion of this effect in Iberian Portuguese. 12
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
160
UR:WPLS, vol. s2000, no. 1
Blockage of vowel reduction in European Portuguese before a sonorant coda is an important point, since it introduces a parallel between European Portuguese and the two-pattern reduction system seen in Russian: in both languages, completely unstressed syllables must be divided into two groups based on their behavior with respect to reduction. In the Russian case, these two groups are (1) moderately-reducing unstressed syllables (the immediately pretonic and word-initial onsetless syllables), and (2) extremely-reducing unstressed syllables. In European Portuguese, the two groups are (1) unstressed syllables that are immune to reduction and (2) unstressed syllables that are not immune to reduction. The same formal device used to account for these two groups in the Russian case can also be applied to the European Portuguese case: some of the unstressed syllables are nonmoraic, while others are not. Namely, I claim that unstressed syllables in European Portuguese are nonmoraic, unless they are closed by a sonorant consonant. Before discussing this possibility, first let’s look at an alternative that won’t work: namely, that the syllables where reduction is blocked receive secondary stress. This alternative is similar to one proposed by Miller (1972) for Easter Ojibwa: she proposes that all long vowels in that language receive some degree of stress, explaining why they are resistant to vowel reduction. Following this example, it might be possible to hypothesize that in European Portuguese, all syllables closed by a sonorant consonant are heavy, similar to the case seen in Kwakw’ala (Boas 1947) and Inga Quechua (Levinsohn 1976). If this were the case, heavy syllables might attract secondary stress and therefore escape vowel reduction. However, although European Portuguese does possess secondary stress, its placement is not determined in the manner under consideration. In current pronunciations as described by Lüdtke (1953) and de Carvalho (1988-92), secondary stress falls on the initial syllable of any word where there would otherwise be more than two unstressed syllables preceding the main stress, as in rèctangulár. Both sources also mention other, less common, patterns for placing secondary stress in European Portuguese, but none seem to place secondary stress on syllables closed by a sonorant. Consider, for example, the form vagàbundágem cited by Carvalho. Clearly, these examples show that the immunity of sonorant-final syllables to vowel reduction cannot be explained in terms of stress placement. It is, however, possible to explain the immunity of sonorant-final syllables to vowel reduction in terms of moraicity. For example, if the sonorant coda consonant is obligatorily moraic (as suggested above), the preceding vowel might share the consonantal mora, or may be prevented from undergoing demorification
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
161
in order to avoid a situation in which a coda consonant is moraically more prominent than the nuclear vowel of that same syllable. With this being the case, it would be possible to apply the *Nonmoraic/-high constraint of Russian to European Portuguese and predict the correct results: only nonmoraic unstressed vowels—that is, unstressed vowels not followed by a sonorant coda—will undergo reduction. It should be noted that it seems phonetically reasonable to posit nonmoraic vowels for European Portuguese. Unstressed vowels (other than those that are immune to reduction) are phonetically similar to the non-immediately-pretonic vowels of Russian, in that they are extremely short, and are commonly devoiced or deleted entirely (Brakel 1985, Carvalho 1988-92). 4.0.1.3. Additional Two-Pattern Systems Finally, it should be noted that the approach taken to the two-pattern reduction system of Russian has repercussions for prosodic structures of other languages with a two-pattern reduction system. In some languages with twopattern reduction systems, the conditioning environment for extreme vs. moderate corresponds to the difference between post-tonic and pre-tonic. For example, in both Rhodope Bulgarian (Miletich 1936) and northern Italian dialects (Maiden 1995), any unstressed syllable that precedes the stress will undergo moderate reduction, while any unstressed syllable that follows the stress will undergo extreme reduction. This suggests that these languages use foot structures such as those illustrated below: ( 46) Assumed Foot Structure for Rhodope Bulgarian and northern Italian (σµ σµ σµ σµ σµ σµ σµ σ#µ) σ σ Assuming that some high-ranking constraint in these languages requires footed syllables to be moraic, the pretonic unstressed syllables will be protected from demorification (*Struc-µ), while post-tonic unstressed syllables will not be. The proposed foot structure is also supported in the northern Italian case by data discussed by Maiden (1995). He points out that there are several processes in addition to vowel reduction that are sensitive to the post-tonic vs. pre-tonic
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
162
UR:WPLS, vol. s2000, no. 1
difference (for example, certain types of vowel assimilations occur in pretonic, but not post-tonic, unstressed syllables in these dialects)13. Another example of a language that has a two-pattern reduction system is Brazilian Portuguese (Redenbarger 1981, Dukes 1993). As described by Dukes, all stressed syllables (including both primary and secondary stresses) in Brazilian Portuguese are immune to vowel reduction14. Secondary stresses are found on every other syllable to the left of the main stress: σσ¿σσ¿σσ#σ. Furthermore, those unstressed syllables that are found between stresses are subject to moderate vowel reduction (/(/ > [e], /o/ > [o]). That is, in the example given in the preceding paragraph, the 2nd and 4th syllables would be subject to moderate reduction, but not the 1st or 6th. Unstressed syllables that do not occur between stressed syllables are subject to extreme reduction (/(,e/ > [i]; /o,o/ > [u], /a/ > []). Such syllables will occur in two places: word-final unstressed position and word-initial unstressed position. Given our assumption that extreme reduction in two-pattern systems results from nonmoraicity, we must assume the following prosodic structure for Brazilian Portuguese: (47) Brazilian Portuguese Prosodic Structure (proposed) σ (σ¿µ σµ) (σ¿µ σµ) (σ#µ) σ Note that in any word with a penultimate main stress, the main stress foot will be monosyllabic under this analysis: the syllable immediately preceding the stress is the weak member of the preceding syllabic trochee, and the following syllable is left unfooted. The unfooted nature of the final syllable can easily be derived using Prince & Smolensky’s (1993) Nonfinality constraint, which prohibits a foot to stand at the right edge of a word (“the right edge of a word may not align with the right edge of a foot”). This proposal is supported in Brazilian Portuguese by the fact that in words with an antepenultimate main stress, the
13
It should be pointed out, however, that Maiden proposes a tripartite prosodic structure for these words, in which the pretonic unstressed syllables constitute a prosodic domain apart from the stressed syllable. 14
The descriptions of moderate reduction and extreme reduction in Brazilian Portuguese provided, respectively, by Dukes (1993) and Redenbarger (1981) do not agree in all details. It seems as though dialectal variation regarding some aspects of this pattern.
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
163
penultimate syllable (unstressed) is subject to only moderate reduction, suggesting a foot form such as (σ¿µσµ)(σ#µσµ)σ, for example.
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
164
UR:WPLS, vol. s2000, no. 1
4.1. Addendum: Notes to the Russian Analysis: The following notes concern details of the analysis of Russian vowel reduction that may be of interest to those scholars who are more familiar with the Russian vowel reduction patterns than the average reader would be. Note 1: On Traditional Names for the Dialects The dialects of Russian discussed here are all considered “folk dialects”— a term used in Russian dialectology to refer to dialects of Russian spoken in those areas traditionally inhabited by a Russian-speaking population, excluding major metropolitan areas. The patterns described here are based on the descriptions provided in Avanesov and Orlova (1964), Kuznetsov (1973) and Kasatkin (1989). These dialects are usually grouped into three large groups: the Northern, Central (or Mid), and Southern dialect groups. Each of these three dialect groups is associated with particular phonological characteristics. For example, the dialects in the Northern group tend to either lack vowel reduction, or have only limited vowel reduction. Strong reduction patterns are characteristic for the Central and Southern dialect groups. However, it should be noted that there is no such thing as “the” Northern dialect of Russian—each of these three dialect groups comprise a multitude of individual dialects, often showing significant variation from village to village. In addition, dialects vary not only with respect to vowel reduction, but also with respect to other phonological parameters (vowel inventory, consonant inventory, patterning of consonant clusters, accentual patterns, etc.) as well as a number of other linguistic parameters, including lexical, syntactic, and morphological characteristics. It is not the case, for example, that a specific vowel reduction pattern is associated with a single unique dialect. Instead, a given vowel reduction pattern might be attested in several individual dialects that differ significantly with respect to other parameters. Therefore, Russian dialectologists do not refer, to e.g. “the [e]-reduction dialect of Russian”, but rather to “those dialects showing [e]-reduction”. Similarly, although the dialects that show a specific vowel reduction pattern tend to group geographically, these geographical groupings may be cross-cut by groupings based on other parameters—therefore, although there are some groups of dialects that are both linguistically similar and geographically compact (i.e., “Vladimir-Volga Basin dialects”), geographically-based dialect names are usually linguistically uninformative. For example, the Obojan Dissimilative vowel reduction pattern was first noted outside the south Russian city of Obojan—however, other vowel reduction patterns are also noted in and around Obojan, and the Obojan pattern is
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
165
noted in numerous other southerly regions of Russia. Therefore, we can speak of the Obojan pattern of vowel reduction, or dialects displaying the “Obojan pattern”—realizing that the collection of all dialects displaying this pattern might differ in their syntax, morphology, lexicon, etc. We can also speak of the “Obojan dialects”, which would refer to those dialects found in the geographical area of Obojan, regardless of whether or not they are linguistically similar. However, we cannot really speak of “the Obojan dialect”. Note 2: The reduction of unstressed /e/ Throughout this chapter work, reduction of unstressed /e/ to [i] in Russian is treated as a “direct” reduction fact. That is, it is not treated as the result of consonant palatalization. In the non-immediately-pretonic syllables of Russian dialects, vowel reduction is often profoundly affected by the palatality of the preceding consonant. In most dialects, /o,a/ reduce to [i] in non-immediately-pretonic syllables if the preceding consonant is palatalized, and reduce to [] in other nonimmediately-pretonic syllables. This pattern presents something of a riddle for the analyst of Russian vowel reduction patterns because unstressed /e/ also reduces to [i] after palatalized consonants. Is this because Russian reduces /e/ to [i] directly (similar to vowel reduction in Bulgarian, for example), or because of the influence of the preceding palatalized consonant? In order to test this hypothesis, it is necessary to see how unstressed /e/ reduces when not preceded by a palatalized consonant. Unfortunately, due to the historical development of the Russian vowel system, /e/ does not occur in such positions, or does so only marginally—making this a largely academic question. Evidence from CSR indicates that /e/ > [i] is not due to the presence of a preceding palatalized consonant. In this dialect, /e/ can occur after some nonpalatalized consonants—namely /tes, 6, =/. These consonants were historically palatalized, but subsequently lost palatalization. In most dialects, the nonpalatalized consonants /tes, 6, =/ still produce the vowel reduction patterns seen after palatalized consonants—unstressed /e,a,o/ reduce to [i]15. However, in some 15
It is possible that this anomalous behavior is caused by the high tongue position characteristic of the /6,=/ phonemes, which are strongly velarized in many dialects. For example, in the Old Muscovite pronunciation norm (which was prevalent earlier this century), /=,6/ caused /o,a/ to reduce to [i]. These consonants were also produced with such noticeable velarization that some Russian
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
166
UR:WPLS, vol. s2000, no. 1
dialects, including CSR, /tes, 6, =/ do not have this behavior. In these dialects, /a,o/ reduce to [] after /tes, 6, =/, but /e/ reduces to [i]. Some examples of this pattern are given below using the consonant /tes/, where this behavior is most consistent: (48) Reduction of /e/ > [i] and /o,a/ > [] After Nonpalatalized Consonant Vowels Same Vowels gloss Under Stress Unstressed unstressed e > i ’church’ noun/adj. tesérkftesirkóvnij ’whole’/’in whole’ tesélij tesil-ikóm ’czar’/’czar’s palace’ unstressed o,a > tesártesradvór-its ’chirp’ noun/verb tesókt teskatátWe can also see the direct reduction of /e/ to [i] in the nativized pronunciations of certain foreign words that contain /e/ at the absolute beginning of the word. For example, forms like [éksprt] 'export (noun)' versus [ikspart-írvt-] 'to export' show that the e>i alternation is not a consonant~vowel assimilation process. The suffixed form /eksport-írovat-/ also has a more conservative pronunciation without vowel reduction on the initial vowel—this type of pronunciation sometimes gives the impression that the unreduced vowel was pronounced with a secondary stress, although such vowels are, in fact, unstressed. However, there is no variant pronunciation with reduction of /e/ to [] or [a], and native speakers find such a pronunciation unacceptable: *[akspart-írvt-]. Therefore, it can be assumed that in at least some dialects of Russian, e>i is a straightforward case of vowel neutralization under reduction, and not a consonant-vowel assimilation effect.
Note 3 On reduction after palatalized consonants As mentioned in the text, the interplay of consonant palatalization and vowel reduction is somewhat more complex than depicted in this analysis. In this note, I shall make the complicating factors more explicit. As discussed below, an analysis that is in effect only a slight modification of the one presented in the main text will be sufficient, given proper assumptions about the input and output structures involved in this reduction pattern.
phoneticians considered them to be doubly-articulated consonants (Akishina & Baranovskja 1980). Currently, /6,=/ do not cause reduction of /o,a/ to [i], and are no longer obligatorily velarized.
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
167
One complicating factor is that the unstressed back rounded vowel /u/ does not undergo fronting when preceded by a palatalized consonant, but the unstressed back rounded vowel /o/ does. That is, in unstressed position preceded by C-, /o/ reduces to [i], but /u/ remains unreduced (surfacing as [u]). Phonetically speaking, the [u] that is found in such environments is produced with a much more forward tongue position than is stressed [u] or unstressed [u] not preceded by C-. However, phonetic measurements such as those provided in Jones (1959) suggest that these fronted variants of [u] are not truly front vowels. The analysis of the interaction between the C-/[+front] constraint, faithfulness constraints, and reduction constraints provided here are designed to generate surface [i] for unstressed /o/. However, this incorrectly predicts that unstressed /u/ should also surface as [i]. This unusual pattern of /o/>[i] but /u/>[u] derives historically from the fact that stressed /e/ became [o] when preceded by a palatalized consonant but not followed by one: C-éC > C-óC. However, in unstressed position, /e/ remained [e], where it was eventually subject to the reduction phenomenon /e/ > [i]. In words with etymological /e/ that experience stress shifts, this resulted in the surface alternation of stressed [ó] and unstressed [i]: C-óC
C-iC
óni (pl.) t-óplj (long form, masc.)
iná (sg.) t-ipló (short form, neut.)
=
=
gloss ‘wife’ (nom. case) ‘warm’
The pattern resulting from this historical development can be described synchronically as follows: a non-front vowel will reduce to [i] after C- iff doing so simultaneously involves raising (for the variant [e]-reduction pattern, substitute "lack of raising"). In other words, violation of Max[-high] triggers the constraint C-/[+front]. This suggests an alternative analysis of Russian reduction after Cthat utilizes constraint conjunction. This alternative will be considered here due to its generality of application, although a simpler alternative that seems applicable for at least some dialects of Russian will also be suggested. As discussed by Crowhurst and Hewitt (1997), constraint conjunction can be used to generate “triggering” effects of the sort under consideration. In this case, the C-/[+front] constraint can be conjoined with Max[-high]: [Max[-high] ^ C-/[+front]]. This sort of conjoined constraint is violated only in case both conjuncts are violated simultaneously. In other words, an output candidate that violates one or the other of the conjuncts, or neither, will not violate the conjoined constraint as a whole. This means that the conjoined constraint has nothing to say about the reduction of /i/ or /u/, since they will always vacuously satisfy at least
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
168
UR:WPLS, vol. s2000, no. 1
one of the conjuncts since they do not possess an underlying [-high] specification (i.e., Max[-high] is vacuously satisfied). Unstressed /i/ and /u/ are therefore incapable of violating the conjoined constraint [Max[-high] ^ C-/[+front]], resulting in a situation where /u/ is immune to consonant palatalization effects in unstressed positions, but /e/, /o/, /a/ are not. In other words, the conjoined constraint will make no distinction between /u/>[i] and /u/>[u]—both satisfy the conjoined constraint. However, other constraints, namely Max[round] prefer the faithful candidate [u]. Obviously, this solution is somewhat akward. And indeed, it appears as though either the conjoined constraint or the pattern that it attempts to replicate are being ousted by native speakers of CSR. Consider, for example, the fact that the /o/ > [i] reduction pattern is not being extended to new words. Consider, for example, the following derived forms: simplex form [rajón] [pv-il-ón]
gloss ‘administrative region’ ‘pavilion’
derived form [rjan-írvt-] [pv-il-an-írvt-]
gloss ‘develop into a rajon’ ‘develop into a pavilion’
The first derived word included in the table above, [rjan-írvt-] is an existing, newly-formed word of Russian, which is pronounced as indicated. The expected pronunciation with /o/>[i] is not observed: *[r jin-írvt-]. The second derived form, [pv-il-an-írvt-], is a word I constructed and presented to native speakers to test whether the [rjan-írvt-] pattern would also occur in new formations after C- as well as after /j/. The native speakers polled all preferred [pv-il-an-írvt-] (as indicated in the table). The alternative pronunciation *[pv-il-in-írvt-] was uniformly rejected, and was even rated as worse than a form lacking reduction altogether (*[pv-il-on-írvt-]). This pattern suggests that in at least some dialects of Russian (including CSR), the existing cases of /o/~[i] alternation might be morphophonologically conditioned. It could be, for example, that certain lexical items are listed with two different stem variants, and morphophonological rules determine which stem is to be used in a given context. That is, a case of alternation like [=óni]~[=iná] (‘wives’~’wife’) could be accounted for by using the stem variant /=on-/ in one case, and the stem variant /=en-/ in the other. Further scrutiny of the /o/>[i] alternation in CSR suggests that this hypothesis should not be taken lightly. For example, it is well known that derivatives of words with [o]~[i] alternations will surface with stressed [é] in certain morphological categories: cf. [=én-sk-ij] ‘feminine’, for example.
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
169
Furthermore, recent dictionaries of CSR list variant forms with either stressed [ó] or stressed [é] for some words using this type of stem—for example, both [=ón-in] and [=én-in] are attested forms for the possessive adjective ‘wive’s’. If this morphophonological analysis can be extended beyond CSR, a much simpler analysis that is very similar to the one provided in the main text above would suffice. Under this scenario, it would be the case that only /e/ and /a/ reduce to [i] after palatalized consonants. This means that one could use the simple, nonconjoined version of C-/[+front] presented in the main text ("a palatalized consonant must be followed by a [+front] vowel in unstressed position"). The ranking of Max[-front] above C-/[+front] would block /u/>[i] and /o/>[i]. Given an underlying representation for /a/ that is unmarked for [front], the C-/[+front] constraint would effect unstressed /a/, but not unstressed /o/. Reduction of unstressed /o/ to [a] after C- (as in [rjan-írvt-]) would require only the assumption that surface [a] derived from unstressed /o/ is specified [-front] (to be faithful to the underlying [-front] specification of /o/), while underlying /a/ is not.
References: Alderete, John. 1995. Faithfulness to Prosodic Heads. ms., University of Massachusetts, Amherst. Al'mukhamedova, Z.M. and P.Ë. Kul'sharipova. 1980. Reduktsiia glasnikh i prosodiia slova v okaiushchikh russkikh govorakh. Kazan'. [Vowel Reduction and Word Prosody in Okaiushchi Russian Dialects] Avanesov, R.I. 1984. Russkoe literaturnoe proiznoshenie. Prosveshchenie: Moscow. [Russian Literary Pronunciation] Avanesov, R.I. and V. G. Orlova. 1964. Russkaia dialektologiia. Prosveshchenie: Moscow. Boas, Franz. 1947. “Kwakiutl grammar with a glossary of the suffixes,” Transactions of the American Philosophical Society, 37:3, 201-377. Bondarko, L. V., L. A. Verbitskaia, and L.R. Zinder. 1966. “Akusticheskie kharakteristiki bezudarnosti” [“Acoustic characteristics of stresslessness”], 56-64, Strukturnaia tipologiia iazykov. Nauka: Moscow. Brakel, Arthur. 1985a. “Reflections on the analysis of exceptions to the rule of Iberian Portuguese vowel reduction,” Hispanic Linguistics, 2:1, 63-85. Brakel, Arthur. 1985b. “Towards a morphophonological aproach to the study of linguistic rhythm”. Proceedings of the Chicago Linguistics Society, vol. 21, pp. 15-25. Broselow, Ellen, Su-I Chen, and Marie Huffman. 1997. “Syllable weight: Convergence of phonology and phonetics,” Phonology, vol. 14, 47-82.
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
170
UR:WPLS, vol. s2000, no. 1
Carvalho, Joaquim Brandao de. 1988-92. “Réduction vocalique, quantité et accentuation: pour une explication structurale de la divergence entre portugais lusitanien et portugais brésilien,” Boletim de Filologia, vol XXXII, 5-26. Crosswhite, Katherine. 1999. Vowel Reduction in Optimality Theory. Ph.D. dissertation, UCLA. Davis, Phillip W. 1970. "A classification of the dissimilative jakan'e dialects of Russian," Orbis, Bulletin international de Documentation linguistique. Vol. XIX, no. 2. 360-76. Dukes, Beatriz. 1993. “Vowel Reduction and Underspecification in Brazilian Portuguese,” UCLA Occassional Papers in Lingusitics. Halle, Morris and Jean Roger Vergnaud. 1987. An Essary on Stress. Cambridge: MIT Press. Halle, Morris. 1959. The Sound Pattern of Russian. Mouton and Co., 'SGravenhage Halle, Morris. 1965. "Akan'e, the treatment of unstressed nondiffuse vowels in Southern Great Russian dialects," in Symbolae Linguisticae in honorem Georgii Kurylowicz, ed. by Heinz et al. Polska Akademia Nauk: Wroclaw. 103-09. Hubbard, Kathleen. 1995. “Toward a theory of phonological and phonetic timing: evidence from Bantu,” 168-187, in Phonology and Phonetic Evidence: Papers in Laboratory Phonology IV, ed. Bruce Connell and Amalia Arvaniti, University Press: Cambridge. Jakobson, Roman. (1929). Remarques sur l'evolution phonologique de russe compar¡e celle des autres langues slaves. Republished in Selected writings of Roman Jakobson (1971), vol. 1, Phonological studies. 7-116. The Hague: Mouton. Jones, Lawrence. 1959. “The Contextual Variants of the Russian Vowels,” 157197 in The Sound Pattern of Russian, Morris Halle. ‘s-Gravenhage: Mouton. Kasatkin, L.L. 1989. Russkaja Dialektologiia. Prosveshcheniie. Moscow. Kasatkina, R.F. 1996. "Srednerusskie govory i ritmika slova," ["Central Russian dialects and word rhythm"], in Prosodicheskii Stroi Russkogo Iazyka, ed. by T. Nikolaeva. RAN: Moscow. Kasatkina, R.F. and E.V. Shchigel’. 1995. “Assimiliativno-dissimiliativnoe akan’e,” In L.L. Kasatkin (Ed.) Problemy Fonetiki II. 295-309. RAN: Moscow. Kasatkina, R.F., et al. 1996. "Osobennosti prosodii slova v iuzhnorusskikh govorakh," ["Characteristics of word prosody in South Russian dialects"], in Prosodicheskii Stroi Russkogo Iazyka, ed. by T. Nikolaeva. RAN: Moscow.
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
Crosswhite—Vowel Reduction in Russian
171
Kuznetsov, P. S. 1973. Russkaia dialektologiia. Prosveshchenie: Moscow. [“Russian Dialectology”] Lamprecht, Arnosht. 1987. Praslovanshtine. Brno. Levinsohn, Stephen H. 1976. The Inga Language. The Hague: Mouton. Lüdtke, H. 1953. “Fonemático portuguesa: II – Vocalismo,” Boletim de Filologia, vol. 14, 197-217. Maiden, Martin. 1995. "Evidence from the Italian Dialects for the Internal Structure of Prosodic Domains." in: Linguistic Theory and the Romance Languages. pp. 115-31.. Smith, John Charles (ed.). Maiden, Martin (ed.). Benjamins: Amsterdam. Mascaró, Joan. 1978. Catalan Phonology and The Phonological Cycle. Indiana University Linguistics Club: Bloomington, Indiana. McCarthy, John and Alan Prince. 1993. Prosodic morphology I: Constraint interaction and satisfaction, Technical Report 3, Rutgers university Center for Cognitive Science. Miletich, L. 1936. “’Akanie’ i ‘pulnoglasie’ vu tsentralniia rodopski govoru,” Makedonski pregledu, vol. X, no. 1-2. Miller, Patricia. 1972. “Vowel Neutralization and Vowel Reduction, ” 482-489. Papers from the Eighth Regional Meeting Chicago Linguistic Society. Ed. by Perenteau, Levi, and Phares. Chicago Linguistic Society: Chicago. Nelson, James Platt. 1974. Vowel Phonology in Russian Dialects: First PreStress Syllable in Dialects Characterized by Akan'e. Ph.D. diss., University of Illinois at Urbana-Champaign. Prince, Alan and Paul Smolensky. 1993. Optimality Theory: constraint interaction in generative grammar. To appear from MIT Press. TR-2, Rutgers University Cognitive Science Center. Redenbarger, Wayne J. 1981. Articulator features and Portuguese vowel height. Harvard Studies in Romance languages, v. 37. Repetti, L.D. 1989. The Bimoraic Norm of Tonic Syllables in Italo-Romance. Los Angeles: University of California. Smolensky, Paul. 1993. Harmony, Markedness, and Phonological Activity. Presentation given at Rutgers Optimality Workshop 1. Stroganova, T. Ju., and S.V. Bromlei (Eds.). 1986. Dialektologicheskii Atlas Russkogo Iazyka (Tsentr Evropeiskoi Chatsi SSSR). Vypusk I. Fonetika. ANSSSR: Moscow. Suzuki, Keiichiro. (1998). A typological investigation of dissimilation. Ph.D. thesis, University of Arizona. Suzuki, Keiichiro. 1995. "Double-sided Effect in OT: Sequential Grounding and Local Conjunction," Proceedings of South Wester Optimality Theory Workshop 1995. 209-224.
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf
172
UR:WPLS, vol. s2000, no. 1
Vysotskii, S.S. 1973. "O zvukovoi strukture slova v russkikh govorakh," ["On the sound structure of the word in Russian dialects"] in Issledovaniia po russkoi dialektologii. Zemskaja, E. A. 1987. Russkaia razgovornaia rech’: lingvisticheskii analiz i problemy obucheniia. Russkii iazyk: Moscow. Zlatoustova, L. V. 1981. Foneticheksie edinitsy russkoi rechi. Izdatel’stvo Moskovskogo universiteta: Moscow.
Katherine Margaret Crosswhite 514 Lattimore Hall Dept. of Linguistics University of Rochester Rochester, NY 14627
[email protected] http://www.ling.rochester.edu/people/cross
www.ling.rochester.edu/wpls/s2000n1/crosswhite.pdf