Created using PDFonline.com , a Free PDF Creation service - CiteSeerX

0 downloads 0 Views 256KB Size Report
Dec 9, 2005 - tendency was almost always for rules to reorder themselves so as to maximize their application. ...... free – as opposed to the formal, abstract grammatical system suggested in Boersma (1997) and ...... 470 pages. Reprinted ...
Things Aren’t Always What They Seem: The Acquisition of Opacity MARC ETTLINGER University of California, Berkeley December 9, 2005

1. INTRODUCTION When teaching phonological theory to undergraduates, one of the most challenging areas for them to master is the concept of rule opacity. The two types of rule opacity outlined in Kiparsky (1973), counter-bleeding and counter-feeding, are schematized in (1a) and (1c) respectively: (1) Two types of rule opacity a) Counter- b) Transc) Counter- d) Transparent bleeding parent feeding /si-a/ /se/ /se-i/ /si/ s > / _ i i-a -s>/_i --i i>e/_a e-a -e>i/_i sii -[ea] [se] [sii] [i] The challenge these data present to learners of linguistic theory is as follows: The previously established technique of determining alternations by writing down the environments each phone appears in no longer applies as they will appear in environments they seemingly shouldn’t. In the above example, the linguistics student would have been happily establishing the complementary distribution of s and , noting that [] always appears before [i] while [s] never does based on the examples of transparent rule application (1b,1d). This then leads them to posit the rule /s/→[] / _ i. Then the student encounters surface forms like those in (1a,c) (generally neatly collected towards the end of the word list). The forms [ea] and [sii] run counter to the previously established generalization that [] should appear before [i] and only before [i]. The naïve student, dismayed, may be tempted to erroneously throw her hard-earned s→/_i generalization in the trash. Luckily, though, most students can grasp the concept of crucial rule ordering and the idea of rule opacity - that a subsequent rule can render a previous rule’s application opaque - once the final exam comes around. The language learner has to deal with this same problem but without the benefit of explicit pedagogy. How does the language learner extract the generalization that [] precedes [i] in this language when encountering [sii] in (1c) and that [s] precedes all other vowels when encountering examples like [ea] as in (1a)?

The Acquisition of Opacity This learning challenge of abstracting generalizations that aren’t necessarily true of all surface forms is paralleled by the current puzzle in Optimality Theory (OT, Prince & Smolensky 1993) of accounting for this same exact sort of data in theoretical phonology. Discussed in more detail in §2.2, OT is a theory that uses ranked, violable constraints on output representations to derive the most harmonic output form for a given input. The ranking is theoretically the same for all of the phonological outputs in the language. Instead of a rule of the form [+cont, -son]→[-ant] / __ [+high] to account for high vowels palatalizing preceding fricatives, OT would instead have a constraint against unpalatalized fricatives before high vowels – let’s call this constraint PAL. Constraints of this type – markedness constraints – compete against faithfulness constraints that require that the surface form match the input. An example of a faithfulness constraint would be IDENT(PAL) which requires that the identity of the palatal feature of each segment on the output match the identity of this feature on the corresponding segments in the input. The reasons such an approach is problematic for the schematic language in (1) can be observed by comparing the violations of the PAL and IDENT(PAL) constraints in the forms in (1a-d). Example (1d) [i] shows that adhering to the PAL constraint is more important than the pressure to adhere to a form’s underlying representation, IDENT(PAL) – otherwise the output would have simply been [si]. Thus, we can say that PAL outranks IDENT(PAL) or in OT shorthand PAL » IDENT(PAL). The problem arises when looking at the form in (1c). The surface form [sii] violates PAL. This would normally be okay in OT as all constraints are violable, but only in the service of satisfying a higher-ranked constraint. For this particular surface form, however, there are no conceivable constraints that [sii] satisfies better than [ii]. The only reason for [s] appearing in the output is that the identity of the palatal feature outranks the constraint against unpalatalized fricative before high vowels, PAL. Thus, this data point suggests that IDENT(PAL) » PAL – the opposite of the previous ranking. This is a ranking paradox and violates a basic tenet of the theory that the ranking is the same for the entire grammar. There have been several theoretical modifications to standard OT to account for opacity that have been forwarded recently which are discussed in §3. I argue, however, that each of these theories presents major problems for the language learner and can not be acquired using general learning processes and theories of OT learning. These theories also are problematic in light of a basic finding in linguistic ontogeny: That the acquisition of the phonological grammar precedes any awareness of morphology. Instead, I forward an alternate theory of opacity that integrates an abstract phonological grammar, like that of OT, with an exemplar-based phonetic instantiation of the output of the grammar. The main virtue of this approach is post-lexical opacity become transparent in the phonological grammar and is rendered opaque only on the phonetic instantiation. This grammar can be learned using children’s capability of regularizing input (Hudson Kam and Newport, 2005) when the ratio of transparent to opaque rule application is adequately large. The learning of constraints can then be appropriately modeled using the Gradual Learning Algorithm (Boersma and Hayes, 2001) which crucially can learn rankings in light of exceptions. 2

Prospectus – MARC ETTLINGER –December 9, 2005

The theory of opacity that I am adopting adheres to a strong form of lexical optimization (Prince & Smolensky 1993) where the language learner posits that underlying representations are the same as the surface forms for all lexical items. This obviates the need to account for any sort of opacity in the lexicon itself. The learner can still learn the phonological generalizations in the grammar, even when she hypothesizes that the input equals the output, because in the OT learning algorithm, markedness constraints are assumed to be initially ranked higher than faithfulness constraints. The markedness constraints are demoted only when the learner encounters forms that suggest that particular phonotactic sequences are allowed in the language. This is discussed in the section on OT phonotactic learning, §4.1. After about eight months, the learner arrives at the full phonotactic grammar by learning what phonotactic sequences are valid while still assuming that inputs are the same as outputs. In this first stage, again, there is no opacity in the OT grammar. In the next stage, beginning at about one year to two years of age, depending on the language, (Ferguson & Slobin XXX) the learner discovers morphology. This now presents a whole new set of problems as the surface representation of the same morpheme will appear differently depending on the context and there can not, therefore, be a single uniform underlying representation common to all surface forms. For example, while the pre-morphologically aware learner posits /ksz/ as the underlying representation for [ksz] (facilitating the acquisition of the phontactic ban on tautosyllabic adjacent stridents), she must now attempt to form a single underlying representation for the plural morpheme which can surface as either [s], [z] or [z] which is, not coincidently, conditioned by the already learned phonotactic generalizations. A whole new set of constraints enter into the grammar as well, as there is now pressure to maintain paradigm uniformity in the grammar, pressure to maintain some resemblance between all surface forms of the same morpheme. Using the schematic example above, the opacity in (1a) [e-a] could be due to the pressure to maintain some resemblances between this stem of [e-a] and the uninflected (1d) [i]. This is one of the areas for further development for the prospectus. The most radical departure from current theories made in this prospectus, concerns the opacity that arises from post-lexical processes. The suggestion here is that these processes are not part of the OT grammar itself, but rather are the result of the phonetic realization of the phonological grammar (Barnes 2002, Blevins 2004, Kavitskaya 2002). To account for this type of opacity, I suggest that this phonetic component of the grammar be explained by a theory of perception by exemplars (Johnson & Mullenix 1997, Pierrehumbert 2001). At this post-lexical level, the instantiation of any arbitrary phonological category is subject to influence by linguistic and paralinguistic factors and the actual acoustic output may be strikingly different from the phonological category it represents. The phonetic output can therefore render the underlying phonological generalizations opaque while in the grammar itself, the interactions are transparent. The opacity is at the border of the phonetics/phonology interface. This differs from the grammatical opacity discussed above in that there should be some evidence – either from careful phonetic comparison of the acoustic outputs or through alternations found in careful versus rapid versus otherwise affected speech (as shown in the case of flapping in Hammond 1990) that clues the learner in to the deeper phonological category. This also requires that the language learner is able to learn phonological patterns independent of phonetic naturalness. Experimental evidence 3

The Acquisition of Opacity seems to show that this doesn’t present a problem (Seidl and Buckley, 2005) and typological evidence seems to show that there isn’t as much of a bias toward natural classes and natural phonology as previously believed (Mielke 2004, Mortensen 2005). Using examples of opacity primarily from English, but some other common languages as well1, I first show, in section 3, that current theories of phonological opacity can not account for its acquisition using general learning mechanisms and instead must resort to a priori, extrinsic algorithms that don’t correlate with what we know about child language acquisition. The one possible theory that is learnable, comparative markedness (McCarthy 2002) is problematic for a different reason: While it can account for counter-bleeding that arises from so called derived environment effects, it can not account for a different type of counter-bleeding opacity, harmonically bound opacity. In the subsequent section (§4.1) as part of my proposal, I contrast the two current major theories for learning OT grammars, Recursive Constraint Demotion (Tesar and Smolensky, 2000) and the Gradual Learning Algorithm (Boersma and Hayes, 2001). RCD is hopelessly doomed to failure on encountering opaque output without the benefit of morphological knowledge – something the learner doesn’t have until later. The GLA, however, was designed to be able to handle exceptions and is well suited to learn opacity even without the benefit of morphology as long as the frequency of the data is in the right proportions. There are several well-warranted criticisms of the GLA but they are generally leveled against the grammatical model associated with it – Stochastic OT – and not the learning algorithm itself. Decoupling the two provides a learning algorithm that can replicated a key finding in recent psycholinguistic literature - that children generalize irregular input as categorical – a finding that is a crucial component of this hypothesized theory of opacity learning. This is discussed in §4.2 as part of a criticism on including variation in the abstract grammar. Finally, in section 5, I propose that an exemplar-based approach to the phonetic realization of the abstract phonological categories accounts for the final instances of opacity – post-lexical opacity – that can be found in grammar. The reasons for positing that this level lies outside the OT grammar rests on the idea that things like variation, affected speech and the effects the articulatory system has on the grammar can’t be modeled correctly as it would introduce a specious set of constraints into the OT grammar correlating to social effects, for example (Keller & Asudeh 2002). Recent experimentation has shown that these variable are in fact captured by linguistic exemplars, detailed, episodic memory traces of the phonetic input. Since paralinguistic information is captured in exemplars and since post-lexical effects are conditioned by these variables, I suggest that the post-lexical level of the grammar be captured by an exemplarbased theory. This proposition suggests that there will be phonetic evidence for this type of opacity such as that discussed in Hammond (1990) on the phonetics of flapping in English or in Bergen (2002) on the myriad linguistic and para-linguistic factors conditioning French liason.

1

Despite the desire to use a typologically broad sample of languages, the ability to obtain phonetic measurements and conduct psycholinguistic experiments on native speakers with an adequate sample size motivated using common languages.

4

Prospectus – MARC ETTLINGER –December 9, 2005 But first, a definition of opacity is needed.

2. DEFINING OPACITY In this section I forward a definition of opacity that focuses on how it presents a challenge for the language learner rather than a definition rooted in a particular theoretical framework. In the course of doing so, several examples of opacity from common languages – Canadian English, a dialect of American English, Finnish, Korean and German are discussed. This sample set of languages – with the focus on English examples – was chosen partly because of the availability of speakers for potential future phonetic and psycholinguistic experimentation. 2.1. GENERATIVE PHONOLOGY After SPE (Chomsky & Halle 1968), opacity constituted a central object of study within the generative framework, including several important early writings of Kiparsky (1968, 1972, 1973). His definition of opacity, which formed the basis of most subsequent work, defined two types of opacity as shown in (2) (paraphrased from Kiparsky 1973): (2) A phonological rule: A→B/C_D is opaque to the extent the that there are surface forms in the language having either: a) A in any environment C_D b) B in any environment other than C_D An example of the first type is found in certain dialects of American English (Donegan & Stampe 1979) which forms a nice contrast with a corresponding transparent interaction of the same rules in other dialects. First, coda nasals before other coda segment are deleted after nasality spreads onto preceding sonorants as in (3) (3) English nasal deletion a) [plt] plant b) [stp ] stamp etc. Compare with coda nasals before non-coda stops: c) [fn.tm]2 phantom

There is also a process of intervocalic flapping that interacts with the nasal deletion; transparently for some speakers, but opaquely for others: (4) Two types of interactions between flapping and nasal deletion: a) Transparent Dialect: plntt Nasal Deletion pltt Flapping plt

2

I believe this needs to be verified with careful phonetic measurement. Either way, it doesn’t effect the subsequent discussion.

5

The Acquisition of Opacity plntt ― pltt

b) Opaque Dialect: Flapping Nasal Deletion

If the rule is formulated as in (5), then the form in (b) is opaque in that we find A, a [t], in the environment of C_D, between two vowels.3 (5) {t,d}→ / V _ V A B C D An example of the second type of opacity is found in Canadian English and certain dialects of American English (Joos 1942, Chomsky & Halle 1968, inter alia). The phonological rule in question is the lowering of diphthongs before voiceless obstruents as shown in the rule in (6):4 (6) Canadian Raising: a) [a]→[i]/ __ [-son, –voice]. A B C D This rule is apparent when comparing the minimal pair write and ride. Before the voiceless [t], the diphthong is raised, [rit], whereas before the voiced [d], it isn’t, [rad]. When these stems are suffixed with [], the stops become flaps rendering the first rule opaque. The surface forms for writing and riding are [ri] and [ra], respectively. The flapping rule changes the environment of the raising rule so that, according to Kiparsky’s formulation, we have the appearance of “B” [i] before a flap [+son, +voice] which is not of the environment “C_D” in the rule in (6). The counter-bleeding relationship results in an apparent overapplication of raising in [ri]: (7) Counterbleeding UR Raising Flapping SR

write /rat/ rit ― [rit]

writing /rat-/ rit- ri [ri]

ride /rad/ ― ― [rad]

riding /rad-/ ― ra [ra]

One reason for the interest in opacity among generative phonologists was, in fact, the challenge it presents for the learnability of the phonological system. While formalizations of the learning problem were not expressed in detail, opacity was intended as “a measure of one of the properties of a rule which determine how hard it is to learn: the ‘distance’ between what the rule says and the phonetic forms in the languages” (Kiparsky 1973, p. 79).

3

Not discussed in the original formulation of Donegan & Stampe (1979) nor in the subsequent rehashing in McCarthy (2002) is the fact that the nasal deletion is in an opaque counter-bleeding relationship with the spreading of nasality to adjacent sonorants as well. 4 An analogous process exists in many other dialects of American English, but with an alternation between long and short vowels. Thus, the difference between ‘write’ and ‘rider’ is [raj] and [ra:j].

6

Prospectus – MARC ETTLINGER –December 9, 2005 The two types of opacity translate into either the under- or over-application of a generalization on the surface. In the case of American English nasal deletion and flapping, this is reflected in the difficulty the leaner would have in formulating the flapping rule. In almost every case, she’d find that flaps only occur between vowels and that every t and d surfaces as a flap intervocalically as when comparing pat [pht] and pat it [pht]. However, forms like [pltt] violate this generalization and the exception isn’t attributable to the nasality on the vowel (the remnant of the original underlying nasal) because forms like natty [ni ] exhibit spreading of nasality from the initial surface nasal and flapping. In the case of Canadian raising, the difficulty for the learner arises in extracting the generalization reflected in rule (6) when encountering an example like [ri] where the raising rule seems like it shouldn’t have applied based on the surface form where is occurs before a flap [+son, +voice] as compared to the more general rule environment is [-son, -voice]. To acquire this generalization in a rule-based framework the learner must undo the flapping rule first to understand why the diphthong raised in [ri] writing and not [ra] riding. If this had to be done for every rule, it would make the process of rule learning untenably difficult: each derivational rule would have to be undone for every generalization for each and every word form. Opaque rule relations also concerned generative phonologists in that they violated the principle of maximum rule application, stymieing efforts to establish intrinsic rule ordering – a supposedly natural rule ordering that consistently applied across all languages and adhered to some general principle. Kiparsky (1968) noted that in historical changes involving the reordering of rules, the tendency was almost always for rules to reorder themselves so as to maximize their application. Counter-feeding rule ordering tended to switch to feeding and bleeding tended to switch to counter-bleeding.5 For example, in standard Finnish, there are two rules in a counter-feeding relationship with each other. A diphthongization of long mid vowels precedes the deletion of medial voiced continuants as in (8). Notice the dipthongization isn’t able to apply because the long vowel appear in the derivation after the diphthong rule. Now, in certain innovative dialects, the rule ordering is reversed, as in (9), where the diphthongization can apply, thereby maximizing rule application. (8) Standard Finnish a) diphthongization b) VC[+cont, +voice]V deletion

/tee/ ― tee

(9) Innovative Finnish dialects a) VC[+cont, +voice]V deletion b) diphthongization

/tee/ tee tie

5

The formulation here states that counter-bleeding – the opaque rule ordering – is more natural than bleeding. Not much will be said in this prospectus on this matter, but it does pose an interesting question: is it transparency or maximum rule application that is preferred.

7

The Acquisition of Opacity Despite this tendency and the desire to establish an intrinsic, cross-linguistic rule ordering, counter-feeding and bleeding rule order persists in many languages, placing the additional burden on the learner of not only having to learn a set of rules, but also an extrinsic order of rule application as well. Counter-feeding and bleeding rule ordering was considered marked as compared to the unmarked, learnable and supposedly natural ordering of maximum rule application. Maximum rule application ordering also facilitated conspiracies where a diverse set of rules all had the same effect on the output. Conspiratorial effects, like the elimination of CCC clusters, not only seemed to be the result of different rules in the same language, but also of rules across languages. This was one of the insights that lead to the adoption of Optimality Theory (OT, Prince & Smolensky 1993) and the abandonment of the rule, and with it the importance of intrinsic and extrinsic rule ordering.6 2.2. OPTIMALITY THEORY OT is an analytical framework in which the pronounced form of an utterance (the output) is selected from multiple possible candidates that are all simultaneously evaluated and directly compared to the lexical representation (the input) by a ranked set of universal and violable constraints (Prince and Smolensky 1993 and McCarthy and Prince 1993 are the best original sources for OT; Kager 1999 is an excellent reference which summarizes the standard theory). Aside from doing a better job of capturing cross-linguistic generalizations, OT is preferable to rule-based theories of phonology because it is a more psychologically plausible theory. Whereas derivations with potentially dozens of ordered rules has no conceivable correlate in the neural system, OT’s parallel architecture is based on neutral networks (Prince & Smolensky 1993). The details aren’t crucial at this junction; only the larger point that the formalism is part of an effort to understand phonology as a cognitive process and not only a tool for capturing generalization in a system.7 An important part of that goal is to develop a theory of acquisition. Chomsky’s (2001) formulation of what constitutes explanatory adequacy includes the caveat that one can transcend explanatory adequacy –solving the logical problem of language – by proving that the theory is also acquirable with a minimal UG. That has been done with the basic instantiation of OT but not with the many alternations hypothesized to account for opacity (see section 3).

6

But not, however, the importance of opacity and the generalization that opaque rule ordering is marked and transparent rule ordering unmarked. This is addressed further in section 3.2. 7 The “richness of the base” component of OT (RoB) is often cited as psychologically implausible if not impossible. If RoB suggested that the speaker generated an infinite set of candidates for each word, this would indeed be a problem. This is addressed by the idea of evaluating “contenders” instead of candidates (Samek-Lodovici & Prince 1999, Riggle 2004) which is the set of possible winners. The idea is analogous to the fact that the linguist needs to only supply a few candidates to justify a constraint ranking – the rest of the infinite set of candidates can be eliminated through deduction. Contenders are derivable from an algorithm, the input form and the constraint set. For example, if * (no labial clicks) is higher than all faithfulness constraints, [a] doesn’t need to be evaluated for the input form /ba/. Similarly, any candidates that logically have a superset of the violations of any contenders can not be a contender itself.

8

Prospectus – MARC ETTLINGER –December 9, 2005 In an output-based theory like OT, the importance of opacity is magnified as the generalizations need to be stated at some seemingly abstract, non-surface level of representation, something that doesn’t exist in standard OT. This is reflected in the selection of incorrect forms by a standard OT grammar. For the English nasal deletion and flapping, the relevant constraints are *NC]σ, a constraint against nasals followed by other segments coda, and *VTV, a constraint against intervocalic stops. The tableaux for /plnt/ and /bt/ are shown in (10) and (11): (10) *NC]σ » FAITH *NC]σ MAX-IO /plnt/ *! (a) plnt * (b) plt (11) *VTV » FAITH *VTV IDENT(SON, VOICE) /bt/ *! (a) bt * (b) b In (10), the form [plt] shows that adhering to the constraint against coda nasal + stop clusters is more important than keeping all of the segment from the input in the output, the condition enforced by MAX-IO. The fact that this markedness violation is not repaired by changing the features of any of the segments shows that the other relevant faithfulness constraints are ranked higher than MAX-IO. Similarly, in (11), the surface form [b] shows that the constraint against intervocalic stops, *VTV is higher ranked than the constraint maintaining the voicing quality and sonority of the segment. Turning, now, to the form [plt t] we can combine the two tableaux giving the tableau in (12).8 (12) Incorrect selection of [plt] over [pltt]

*NC]σ *VTV /plnt t/ *! (a) plntt (b) plt *! (c) plnt *! (d) pltt

IDENT(SON, VOICE) MAX-IO * *

* *

8

The  symbol indicates the form that should have been chosen by the tableau while the  marks the incorrect form the grammar has selected.

9

The Acquisition of Opacity The only way to get the right form would be to rank IDENT(SON, VOICE), the constraint that keeps the /t/ a [t], above *VTV, a ranking paradox given the *VTV » IDENT(SON, VOICE) RANKING from (11). Formalism aside, what this essentially asks is, if the flapping of [t] in between vowels is indeed a productive process, then why isn’t the [t] flapped in [pltt] as it is in other words? In formalizing the acquisition of a phonological grammar, the problem is that a learner can’t learn a grammar that doesn’t work. This shortcoming of standard OT has not gone unaddressed, however. Significant research has gone into the investigation into how OT can deal with opacity and in the next section, I discuss a few of them with particular attention to the problems they pose for a learning theory of grammar. 3. CURRENT PROPOSED THEORIES OF OPACITY The study of opacity in OT is an extremely fertile area with an ever growing number of alternatives to consider. The past few years in particular have seen myriad theories proposed, none coming to the fore as a definitive approach. There is a likely reason for this: What was treated as a singular phenomenon in a rule-based theory is unlikely to have a singular answer in a completely different framework (Ito & Mester 2003). In this section I discuss a few of theses approaches with an eye towards assessing their learnability as well as raise any major empirical deficiencies that have been noted in the literature. The theories assessed are sympathy theory (McCarthy 1997), constraint conjunction (Ito & Mester 2003), comparative markedness (McCarthy 2002) and LPM (or stratal) OT (Kiparsky 2000, Bermudez-Otero 1999). In many cases, a challenge to learnability in one theory is the same as for other theories not discussed. The strategy of using an abstract, non-overt form to assess the optimality of candidates in sympathy theory, for example, can also stand in for the abstract representations forwarded in virtual phonology (Bye 2001) or the abstract domains in optimal domains theory (Cole & Kisserberth 1994) that have no surface correlate. This is by no means a complete assessment of the state of the art for approaches to opacity, but rather gives an idea of the challenge to learnability that most of these theories present. In most cases the problem is either with the explosion in the number of constraints that would be needed, reliance on an abstract form the learner will never hear, or a reliance on morphology – something the learner doesn’t have until later. In the last two cases, constraint conjunction and comparative markedness, there is also a class of counter-bleeding data that the theories can not account for. Any complete theory of phonology will need to account for the diversity of opaque data; empirical completeness is not the aim of this investigation, however. The aim is to find a theory that is most in line with our current understanding of general learning processes. 3.1. SYMPATHY THEORY Sympathy theory (McCarthy 1997) was one of the earliest attempts at solving the opacity puzzle in OT and has therefore been subject to the greatest amount of empirical scrutiny. The theory hasn’t stood up particularly well and has been essentially ruled out (Ito & Mester 2003, Levi 2001), as evidenced by its omission in McCarthy (2002)’s comparison of comparative markedness to other theories of opacity. Sympathy theory maintains the parallelist approach of 10

Prospectus – MARC ETTLINGER –December 9, 2005 evaluating all constraints at the same by introducing Sympathy constraints that require faithfulness to a Sympathy candidate (marked by a ). The Sympathy candidate, one of the failed candidates, is the most harmonic of all failed candidates that also adheres to a a selector constraint, marked by a . For [ri] writing, the standard OT tableau in (13) express the idea that a grammar that prefers a flaps to a [t] intervocalically and raised diphthongs only before voiceless stops should prefer the form [rai] to [ri] because there is no reason to raise the diphthong prior to the surface flap. There is no possible re-ranking of the constraints that would make [ri] the most optimally candidate because it has a super-set of violations. (13) Incorrect selection of [rai] over [ri]

/rat/ (a) rat (b) rit (c)  rai (d) ri

*aT *(!)

*VTV *(!) *!

IDENT(SON, VOICE) IDENT(LO) * * *

*!

In the Sympathy theoretic account, the constraint maintaining the identity of the underlying /t/ selects the sympathy candidate, [rit], which is the most harmonious candidate amongst those that have surface [t]. The sympathy constraint requires that the optimal candidate have the same diphthong as the sympathy candidate, [rit]. With this new constraint, the correct output form, [ri], is selected because it the intervocalic stop is flapped and the quality of the diphthong of the sympathy candidate is preserved. The upshot is that this new grammar selects the candidate with the same vowel quality as the most harmonic candidate that maintains the quality of the intervocalic [t], mutatis mutandis. (14) Selection of correct candidate via sympathy theory

IDENT(LO) *aT /rat/ *! *(!) (a) rat (b) rit *! (c) rai (d) ri

*VTV IDENT(SON, VOICE) IDENT(LO)  *(!)  * *! * * *!

There are a number of problems with Sympathy theory, empirical, acquisition-related and theoretical in nature. The empirical problems stem from Sympathy theory’s inability to deal with rule sandwiching, Duke-of-York derivations and similar effects. For example, Levi (2000) shows that the predication that “[if] two notionally distinct processes … violate exactly the same faithfulness constraints, then they must always act together in rendering a third process opaque” (McCarthy 1999) is false using data from the Mizrahi dialect of Modern Hebrew. 11

The Acquisition of Opacity

There are two processes deleting segments in codas, one for glottal stops, , and the other for pharyngeal fricatives, . What makes them two separate processes is that in a serial derivation, they occur at different times as can be seen in the two forms in (15). (15) Opaque and transparent segment deletion in Mizrahi Hebrew

  

→Ø/_ ]σ e→a/_CC →Ø/_ ]σ  





itpaleti ― ― itpaleti]

― itpara itpara [itpara

The form itpaleti] is transparent in that there are no violations of any constraints in the surface form while [itparais opaque: The reason for the lowering of /e/ to [a] is obscured by the subsequent deletion rule. These rule are in a counter-bleeding relationship. The reason this can not be accounted for with Sympathy theory is that the two different processes violate the same faithfulness constraint, MAX-IO, which militates against segment deletion. This can be seen in the two tableaux in (16) and (17) taken from Levi (2000). If sympathy theory is set up to accommodate opacity between two constraints, in this case constraint on vowel height and the constraint against deletion, then those two constraints must always be in an opaque relationship with each other. The data show otherwise. (16) Selection of correct opaque candidate via sympathy theory 

*]

σ

*]

σ

*eCC IDENT(LO) MAXC IDENT(LO)



*!

*

*



Sympathetic 

*!







Faithful Transparent Opaque



*!



*

* *

*

(17) Selection of incorrect transparent candidate via sympathy theory 

*]



*!

Sympathetic 

*!

Faithful

σ

*]

σ



*eCC IDENT(LO) MAXC IDENT(LO)

*

*









*

*

*

Transparent   Opaque

*!



*

Aside from these empirical deficiencies, there are also several problems with sympathy theory from a psycholinguistic perspective.

12

Prospectus – MARC ETTLINGER –December 9, 2005 First, assuming the interpretation of “richness of the base” in footnote 7, where only contenders are evaluated instead of the infinite set of all possible candidates which makes OT psychologically plausible, it may be possible that the sympathy candidate is not even considered.9 In (13), [rai] harmonically bounds [ri] in that it has a subset of constraint violations – it has no constraint violations the other form doesn’t have. Therefore, [rai] will always win (without any additional constraints added to the system) when the two are compared regardless of the constraint ranking. According to Samek-Lodovici and Prince (1999)’s formulation, [ri] shouldn’t even be a contender. Therefore, it isn’t in the pool of forms the speaker/hearer is considering, obviating any possibility of using it as a sympathy candidate or, in this case, using it to learn what the sympathy and selector constraints need to be. The learner, aside from never actually hearing the sympathy candidate, won’t even consider it. Coetzee (2002) suggests otherwise and makes the case that given the sympathy theory architecture, one can readjust the contender set to include any necessary missing forms. The critical flaw in his logic, however, is that he assumes an already determined selector constraint allowing the inclusion of rejected candidates as contenders. The learner, on the other hand, has to actually learn what the selector constraint is. This circularity makes sympathy theory untenable. What the objection amounts to is a challenge to the notion that the learner is able to use an abstract form that isn’t part of her linguistic input data to evaluate candidates. Without hearing the abstract form overtly, the learner has no way of knowing what the criteria should be to select it. So, Sympathy theory suffers from empirical and learnability problems eliminating it as a possible psychologically real solution to opacity. 3.2. LPM-OT LPM-OT (Kiparsky 2000, Bermudez-Ortero 1999), also known as stratal OT, links three distinct OT grammars serially.10 The input to the first stratum is the UR and its output serves as the input to the second stratum, and so on. The three levels correspond to the stem, word and post-lexical levels of lexical phonology (Booij 1997 inter alia) and thus are said to be externally motivated. To generate the example from above, [ri] writing, the ranking in the first stratum would result in the diphthong raising on the stem alone, /rat/→rit as shown in (18). The motivation for placing this constraint ranking effect in the first stratum is that this is a generalization found in the lexicon for all stems and not just of words as evidenced by lexical items like mitre [mit] where the diphthong is raised (Bermudez-Ortero 2003) as contrasted with post-lexical word

9

An additional objection raised relating to richness of the basse (Ito & Mester 2003) concerns the fact that sympathy theory can not account for allophonic variation as the first process. The winner, in sympathy theory, is always more faithful to the input than would otherwise be expected based on the transparent interaction of constraints. With allophonic variation, however, the input can be either of the two allophones (RoB) and the output should be the same since the grammar selects the correct allophone, not the UR. In that case, sympathy would predict different outputs based on the input. 10 It should be noted that, coinciding with the development of OT, two other constraint-based theories of phonology, Lakoff (1993) and Goldsmith (1993) also used a similar three-level system for dealing with opacity,. For some reason, they are rarely referenced.

13

The Acquisition of Opacity combinations like [li f mi] lie for me and with word-level suffixes like [afl] eyeful where it is not (cf. the stem-level suffix ‘er’: [lif] lifer). (18) Stratum 1: Stem-level phonology

/rat/ (a) rat (b) rit

*aT IDENT(LO) *(!) *

The output is then combined with the // suffix and subject to word-level constraints. In this instance, nothing changes in the second stratum and rat- now serves at the input to the final, post-lexical stratum where flapping occurs. This is accomplished via the ranking in (19). The motivation for placing this ranking in the post-lexical stratum and not in be previous stratum is that the flapping effect is also find across word boundaries as in hit it [h t]. (19) Stratum 3: Post-lexical phonology

/rit-/ *VTV ID(SON, VOICE *! (a) rit * (b) ri At first glance this seems like an ideal solution to the problem of opacity. Any empirical challenge can be met as long as no more than three level are required and as of yet, no instances of three sequential opaque relations has been found. The criticism of LPM-OT are not empirically based, but are rather theoretical and learning-based. Ito & Mester (2003) raise the objection that LPM-OT is too permissive in its predictions regarding the types of opacity found in language. In fact, there are no prediction, aside from suggesting that the opaque “rules” must correlate to different stem, word, and post-lexical levels. There are no restrictions on what can happen between levels as per the completely free reranking from one stratum to the next, and even for an example like mitre [mit] where there is no morphology, one can appeal to the three levels of grammar if need be. An approach of this sort ends any further investigation into what the consistencies across opaque relationships may be. This approach also fails to capture the insight of Kiparsky’s earlier work discussed above where opaque relationships11 are marked as compared to their transparent order. In LPM-OT, there is no reason for one to be preferred to the other as opacity and transparency are both equivalently stable and theoretically accounted for. Opacity and transparency are no different in their formulation in the grammar. 11

While it is clear the counter-feeding is marked with respect to feeding, there is the open question as to whether bleeding or counter-bleeding is the marked relationship. Being opaque and the challenge that presents for acquiring generalizations given surface forms suggests that counter-bleeding is marked while the historical changes cited above and the hypothesis of maximum rule application suggests that bleeding is marked.

14

Prospectus – MARC ETTLINGER –December 9, 2005

Finally, there are problems in acquiring opacity in this framework, despite the assertion in Bermudez-Ortero (2003) which is commendable for being one of the few concrete attempts at characterizing an opacity-learning algorithm. The problems are rooted in the fact that the child acquiring phonotactic generalizations at, say, eight months of age has no awareness of the morphological component of the grammar. It isn’t clear whether this three-strata system is intended to be an innate component of UG or whether it is supposed to be acquired. If the hypothesis is that it is acquired, then it would be impossible to do so without any awareness of morphology. The eight-month old would therefore have a single stratum at the early stages of phonotactic acquisition12 to deal with input like [ri]. A brief glance at the frequency of a few morphologically inflected and derived forms in the input a child receives from their parents shows that the frequencies of these forms in childdirected speech are not trivial and can’t simply be ignored: (20) CHILDES Parental Corpus (MacWhinney 2000) of 2.6 million tokens (24,000 types) of

parent speech directed at children across a number of corpora13 Stem finish walk ride eat write happen

Frequency 797 728 859 3960 997 1928

Stem+Morphology finished walking riding eating writing happened

Frequency 922 243 237 1046 245 372

Given these inflected and derived forms and a single stratum, the desired phonotactic generalizations would be unlearnable and the algorithm proposed by Bermudez-Ortero would be moot. A possible solution is to suggest that the phonotactic grammar re-organizes at some point after the acquisition of morphology. This would be quite an undertaking of re-ranking and rederiving underlying forms for the entire lexicon and, as far is I know, evidence is lacking that such a drastic reorganization of phonotactic generalizations takes place. It is worth investigating further, however. If it is hypothesized that the three-stratum grammar is innate, this would present a problem for the claim that LPM-OT does not add any major additional components to UG. It is also curious to hypothesize that a pre-morphologically aware child has separate levels in her phonological grammar that serve no purpose except as a place-holder for a subsequent linguistic capability. Aside from these objections, there are other problems in the learning algorithm proposed in Bermudez-Ortero (2003).

12

It is possible that there would be an additional level for the post-lexical level enabling children to identify words regardless of their context – ‘get’ would be recognized as the same lexical item regardless of whether it was pronounced [gt] at the end of an utterance of before consonant initial words, or [g] as in ‘get it’. 13 Next steps in this investigation include stratifying these numbers by age and looking at the phonetics of the children pronouncing these words.

15

The Acquisition of Opacity The proposal starts by extracting the post-lexical generalizations that are surface true of all forms. In the case of writing~riding the relevant process is the flapping of t/d which can also be observed across word boundaries as in [hn] hit Ann/hid Ann – therefore it is put in the postlexical stratum. Evidence that there is a phonological change comes from observing that hit in isolation has [t] in the coda and hid a [d] in and not the flap. This leads to hypothesizing that surface [] is either underlying [t] or [d] including in forms such as mitre [mi]. Crucially, this also leads to the observation that the representation for write is [rit] in comparing write up [rip] and plain write [rit]. At this point the grammar is as shown in (21b) where T represents uncertainty on the part of the learner as to whether the representation at this level is [t] or [d]: (21) LPM-OT grammar learning: writer write a) [ri] [rit] PL stratum - Flapping b) [riT] [rit] Word Stratum - Suffixation c) [riT] [rit]

rider ride mitre eyeful lifer [ra] [rad] [mi] [afl] [lif] [raT] [rad] [miT][afl] [lif] [raT] [rad] [miT][a]

[lif]

The learner is not aware of the morphological correspondence between writer and write (and ride and rider) and therefore can not assume that the representation at this level is the same for the two forms. With flapping in place, the learner can now look at the next lower level for further generalizations. To do so, the learner first quarantines forms with a T in the representation, like mitre. This process is questionable from the point of view of known cognitive functions. It is what the linguist does when working through a problem set, using their explict knowledge about the grammatical systems, but it’s not clear that a learner can do the same thing given the their grammatical knowledge is implicit.14 After quarantining these forms, the learner finds no generalizations (relevant to the derivation) at the word level for this data set and proceeds to the stem-level phonology. Here, BermudezOrtero posits that the learner takes off all word-level suffixes so that she is evaluating the forms in (c). They can then ascertain the generalization that lowered diphthong occur before voiceless obstruents and raised dipthongs before voiced obstruents from write and ride. The problem here, again, lies in the fact that the learner does not know that eyeful is derived from eye. Therefore, on looking for a generalization regarding diphthongs, presented with the forms [afl] and supposed exceptions like cyclops [saklaps], no generalization can be found. Indeed, in comparing the minimal pairs in (22), the learner is more likely to assume a phonemic alternation of the two diphthongs: 14

There is evidence that children regularize over inconsistent data if the ratios are correct (Hudson Kam & Newport 2005) but there is no statistical evidence given to suggest that that is the process here. This regularization capability is a part of the framework hypothesized in section XXXX.

16

Prospectus – MARC ETTLINGER –December 9, 2005

(22) Minimal pairs in a child’s grammar a) Eifel [ifl] vs. eyeful [afl] b) Cycle [sikl] vs. Cyclops [saklops] Despite these problems, however, the idea that opaque relationship arises from the effects at different phonological levels of the grammar – stem, word and post-lexical –is a valuable one and is discussed in section 5. 3.3. CONSTRAINT CONJUNCTION The next theoretical proposal assessed is constraint conjunction (Lubowicz 1998, 2002; Ito & Mester 2003) where two disparate constraints can be joined together to form another constraint with its own independent ranking. The effects are seen when the conjoined constraint is ranked higher than either of the other two. This accounts for instances where the violation of one of two constraints is okay on its own, but violation of both is considerably worse. For example, in (23), if constraints 1 and 2 are ranked lower than constraint 3, then candidate 3 is the optimal candidate. If, however, constraint 1 and 2 are conjoined into a constraint ranked above constraint 3, then candidate 1 or 2 can be chosen. (23) Constraint conjunction

(a) (b) (c)

/UR/ CON1&CON2 CON3 CON2 CON1 Cand1 * * Cand2 * * Cand3 * * *

This approach can deal with the mutual-bleeding opacity found in German codas. German has a well-known process of coda devoicing as seen in (24) as well as a process of coda simplification when nasal are followed by voiced stops. (24) Coda devoicing in German a) /ta:g/ ta:k /ta:g-/ b) /li:b/ li:p /li:bn/

ta:g li:bn

‘day’/‘days’ ‘dear’ / ‘to love’

(25) Coda simplification a) /lag/ la ‘long’ b) /dftg/ dft /dif.t.gi:.rn/ dif.t.gi:.rn ‘diphthong’ / ‘to diphthongize’ These two processes mutual bleed each other in words with a nasal + voiced stop occur in the coda – only one of the two processes can apply because the application of one obviates the need to apply the other. Thus, in /dig/, the two constraints can be satisfied by either deleting the coda /g/ [di] or by devoicing it [dik]. In fact, in standard German, the former is chosen, but in a northern dialectal variation, the latter is the correct form.

17

The Acquisition of Opacity In standard German, the preference of deletion over devoicing as a repair strategy evident in [di] presents problems for forms like /ta:g/ [tak] where devoicing is the preferred repair strategy over deletion (*[ta:]). This is reflected in the tableaux in (26) and (27). The first tableaux for /ta:g/ establishes the primacy of devoicing as a repair strategy over deletion. The grammar then predicts the same repair strategy for /dg/→*[dik]. (26) MAX » IDENT(voice) /ta:g/ (a) ta:g (b) ta:k (c) ta:

*VC(+v)]σ *!

MAX-IO ID(V) * *!

(27) Incorrect selection of [dik] /dg/ *NG]σ *VC(+v)]σ MAX-IO ID(V) * (a) dg *! * (b) dk *! (c)  d Altering the constraint on voiced plosives in codas following nasal would remedy the problem for /dig/, (28) but creates a new problem for /bak/ [bak]. (28) Modification of constraint on voiced plosives… *NK]σ *VC(+v)]σ MAX-IO ID(V) /d / *! * (a) dg *! * (b) dk *! (c)  d (29) …yields invalid output for bank /bak/ *NK]σ *VC(+v)]σ MAX-IO ID(V) *! (a)  bak * (b)  ba Constraint conjunction can remedy this problem by conjoining the constraint against coda stops after nasals with the constraint against changing the voicing. This captures the generalization appropriately: It’s okay to have a voiceless stop in coda position after a nasal [bak] and it’s okay to change the voicing of stops in coda position [ta:k], but you can’t do both! A similar generalization can be made regarding the counter-feeding relationship discussed earlier for pant it [pltt]: It’s okay delete a nasal before a stop or flap a /t/, but you can’t do both:

18

Prospectus – MARC ETTLINGER –December 9, 2005 (30) Constraint conjunction and English nasal deletion/flapping /plnt t/ (a) plntt (b) plt (c) plnt (d) pltt

IDENT(SON, VOICE) & MAX-IO *NC]σ *VTV *! *! *! *

IDENT(SON, VOICE) MAX-IO * *

* *

Before addressing the empirical problems with constraint conjunction, I will first briefly address the learnability conundrum it introduces because constraint conjunction and the next theory, comparative markedness. Both suffer from their inability to deal with counter-bleeding that is not a derived environment effect. That is addressed in section 3.5. The ability to combine any two constraints creates a significant problem in learning this sort of grammar. Given only 10 constraints, two can be combined in 9!, or 362,880 ways, all new constraints that are now part of the grammar which need to be evaluated and ranked. As has been suggested by Ito & Mester (2003), these conjoined constraints can then be recombined with others, iterating the possible conjunctions to a dizzying number. For a full grammar, the process of evaluating and ranking all of these constraints is intractable. Lubowicz (2002) offers a number of suggestions on limiting the constraint interaction, but they are motivated by a priori analysis of what conjunctions work for a particular problem and what ones don’t. 3.4. COMPARATIVE MARKEDNESS Comparative markedness (McCarthy 2002) divides markedness constraints into old markedness and new markedness. Old markedness refers to violations in the output that were also present in the input – or in the fully faithful candidate – and new markedness refers to violations that didn’t originally exist in the input. When the two constraints are ranked together, this approach simply replicates standard OT. When ranked separately this approach has the advantage of being able to account for a number of interesting problems in OT such as grandfather effects, derived environment effects, non-iterative processes and counter-feeding opacity. The focus here will be on its ability to deal with opacity. Returning to our English example of counter-feeding, the under-application of flapping as a result of nasal deletion in plant it [pltt], the first step is to divide the two markedness constraint into old and new. The important process is flapping and so there are two constraints in place of the original *VTV: a constraint against non-flapped obstruents between vowels that also exist in the candidate fully faithful to the output (*OVTV) and a constraint against the creation of new VTV sequences not in the original input (*NVTV). By ranking the old markedness constraint above /t/-faithfulness and the new markedness constraint below /t/-faithfulness we can achieve the effect of making newly derived VTV sequences okay while still forcing the flapping in old ones:

19

The Acquisition of Opacity (31) Comparative markedness and the interaction of English nasal deletion and flapping *NC]σ *OVTV IDENT(SON, VOICE) MAX-IO *NVTV /plnt t/ *! (a) plntt * * (b) plt * (c) plnt * * (d) pltt The correct form is selected because the unflapped /t/ is something newly created by other constraints in the system. This predicts that all newly created VTV sequences will be fine. Comparative markedness is the most promising of all the hypothesis in terms of learnability. At this point, I see no obvious problems with its acquisition as it only increases the number of markedness constraints by a factor of two, doesn’t rely on morphology nor on any unheard abstract forms or representations. The short-coming lies in its inability to deal with counterbleeding that also isn’t a derived environment effect. 3.5. COUNTER BLEEDING AND DERIVED ENVIRONMENT EFFECTS In this section I highlight a major empirical shortcoming of both constraint conjunction and comparative markedness. While they both can successfully account for derived environment effects, what Lubowicz (2002) categorizes as a type of counter-bleeding, neither can account for cases of counter-bleeding when the desired candidate is harmonically bound by the wrong winner. I begin by detailing their approach to derived environment effects, then I offer a schematic proof of the type of data that it can not account for. Finally, I provided data that is problematic in precisely this fashion. In the literature on constraint conjunction (Lubowicz 2002 inter alia) and comparative markedness (McCarthy 2002 inter alia) the focus has been on counter-feeding, which we’ve discussed above, and derived environment effects (DEE’s). DEE’s are when a particular process, either morphological or phonological, creates a new structure, α. Another process is then triggered by α, but only when α is derived, not when it is underlying. A classic example is  epenthesis in Makassarese (McCarthy & Prince 1994): (32) Epenthesis in Makassarese a) /rantas/ rantasa /tetter/ tettere b) /lompo/

lompo

‘dirty’ ‘quick’ ‘big’

When there is a vowel epethensized to remedy an invalid coda (coronals), there is also an epenthetic glottal stop suffixes as well, but if the vowel is an underlying vowel, then no glottal stop is epenthesized. This is opaque in the sense that while rantasa reflects a generalization that words can’t end in a vowel, forms like lompo violate this generalization.

20

Prospectus – MARC ETTLINGER –December 9, 2005

This is problematic for standard OT because in derived environments the constraint against word-final vowels outranks the constraint on not epenthesizing segments while in non-derived environments, the opposite ranking holds. An instance of morphologically derived environment effects can be found in Korean (Ahn 1998) where palatalization occurs only when an alveolar is suffixed with an i, but not when i is in the root following an alveolar: (33) Korean palatalization a) path ‘field’ mat ‘eldest’ b)

pach-i ‘field-COP’ mac-i ‘eldest-NOM’ mati

‘knot’

Again, the question arises of what, precisely, is the phonological generalization to be learned regarding palatalization? It clearly isn’t simply one based on the phonological environment in this case. The application of comparative marked to these instances is straight forward. Ranking new markedness over faithfulness which then outranks old markedness will incur violations for new structures, but not structures that were already in the input. In the case of Makassarese, the constraint on coda-less words only applies if the vowel is a new vowel which is the result of certain codas being invalid. If it’s an old vowel, however, the phonology lets it stand: (34) /rantas/ (a) rantas (b) rantasa (c)rantasa /lompo/ (d)lompo (e) lompo

N*V]wd

*!

CODA *!

DEP

O-*V]wd

* ** * *!

A similar effect can be achieved with constraint conjunction of DEP and *V]wd which would allow the epenthesis of a vowel to ameliorate a bad coda, or a vowel-final word, but not both at the same time, triggering further epenthesis. Lubovicz (2002) terms this counter-bleeding, but it’s not counter-bleeding in the sense that we’ve been using it. In the next section, I offer a schematic proof of what I believe is a novel assertion that CM and CC can never account for data of this type of counter-bleeding.

21

The Acquisition of Opacity 3.5.1. THE PROBLEM OF COUNTER BLEEDING: A FORMAL PROOF Counter-bleeding is an instance of a process happening in an environment it seemingly shouldn’t have – in Kiparsky’s terms, D in an environment other than _C where B→D/_C. In abstract terms, the rule can be implemented with the following tableau: (35) OT implementation of B→D/_C BC (a) BC (b) DC (c) BE

*BC FAITH(C) FAITH(B) *! * *!

BC is the marked structure that needs to be repaired; the ranking of FAITH(C)»FAITH(B) ensures that B will change and not C. This rule can be rendered opaque by virtue of a subsequent process that changes the environment C that initiated the change of B→D, let’s say C→E/_F. Because this changes the form of C, the environment conditioning B’s change, then we’re dealing with bleeding/counter-bleeding. The OT tableau for this change looks identical: (36) CF (a) CF (b) EF (c) CG

*CF FAITH(F) FAITH(C) *! * *!

These two tableaux can be merged for the form BCF with FAITH(C) being the common point between the two. Faith(F) is irrelevant in this case and *BC ranking with respect to *CF doesn’t effect the outcome: (37) /BCF/ (a) BCF (b) DCF (c) BEF (d) DEF

*BC *

*CF FAITH(C) FAITH(B) *

* * *

*

Transparent bleeding rule application is where the second rule bleeds the first, so: (38) Bleeding rule interaction: /BCF/ C→E/_F BEF B→D/_C ―

22

Prospectus – MARC ETTLINGER –December 9, 2005 The combined tableau in (37) accounts for this transparent interaction between the two rules since the output matches the output of bleeding rule ordering. Opaque counter-bleeding rule application is where the second rule counter-bleeds the first: (39) Counter-bleeding rule interaction: BCF B→D/_C DCF C→E/_F DEF Notice that in the tableau in (37) no ranking of the existing constraints can ever output the counter-bled output of DEF because BEF harmonically bounds DEF. If one candidate harmonically bounds another, it means that it has a subset of the constraint violation of the other; that it violates no constraint that the other forms doesn’t also violate. Comparative markedness and constraint conjunction, which make use of existing constraints can therefore never account for this type of relationship. The obvious question is, does this relationship exist in natural language? The answer is most certainly yes as I show in the next section. 3.5.2. COUNTER-BLEEDING’S PROBLEM FOR COMPARATIVE MARKEDNESS: EMPIRICAL DATA To illustrate the type of counter-bleeding that comparative markedness and constraint conjunction can’t handle, the form writing will be used yet again. In (40), recall that the incorrect form [rai] harmonically bounds the desired form [ri] in that it doesn’t have any constraint violations that the other form lacks. Therefore there is no combination of constraints – either through conjunction or by separating them into new and old that can obtain the correct form [ri]: (40) /rat/ (a) rat (b) rit (c)  rai (d) ri

*aT *(!)

*VTV *(!) *!

IDENT(SON, VOICE) IDENT(LO) * * *

*!

Thus, both comparative markedness and constraint conjunction are theoretically deficient in being able to handle the data we’re interested in. 3.6. SUMMARY: THE INADEQUACY OF CURRENT THEORETICAL APPROACHES TO OPACITY In this section, I discussed the major theoretical approach to opacity within Optimality theory and each case was problematic for one reason or another. In the case of Sympathy theory, there

23

The Acquisition of Opacity were empirical problems in dealing with rule sandwiching as has been pointed out prior and there were also problems with implementing in a learning theory as it made reference to an abstract form the learner would never hear. Stratal OT’s main deficiency was in its reliance on morphology in boot-strapping the acquisition of opacity. Constraint conjunction and comparative markedness both were insufficient in dealing with a particular type of counter-bleeding opacity of interest in the prospectus and constraint conjunction also was not a plausibly learnable theory because of a factorial explosion in the number of constraints. This is summarized in table 1: Table 1 Approach to Opacity (a) Sympathy Theory (b) Stratal OT (c) Constraint conjunction (d) Comparative Markedness

DATA

THEORY

LEARNING

 DoY  Rule sandwiching

 Reliance on abstract forms

 Too permissive  Use of questionable cognitive techniques  Misses the generalization  Reliance on morphology on opacities markedness  Harmonically bound  Too many new constraints counter-bleeding  Harmonically bound counter-bleeding

Aside from the theoretical challenges presented by opacity, the aspect of it that is of particular interest in this prospectus is the challenge it presents to a learning system. Regardless of the theoretical approach, the fact remains that the learner is required to learn a generalization that is obscured. Broadly, then, opacity can be defined as exceptionality in grammar; the particular type of opacity that is the focus here is opacity that arises from some other phonological process. Given the broad definition, opacity also includes derived environment effects, as exemplified in the section 3.5 on Korean. The data, repeated again in (41), can not be described by appealing to counter-bleeding or counter-feeding rule ordering. Instead, it is opaque in the sense that there is a generalization that is obscured by the fact that it is triggered by a morphological process and therefore doesn’t always hold. (41) Korean palatalization a) path ‘field’ mat ‘eldest’ b)

pach-i ‘field-COP’ mac-i ‘eldest-NOM’ mati

‘knot’

The goal is to understand how systems like these can be learned. 4. AN ALTERNATE PROPOSAL Given the learnability and/or empirical issues with all of the above approaches to opacity, I suggest an alternate approach that integrates a number of disparate approaches to learning and categorization. The core idea are A) the acquisition of opacity can be divided into two stages,

24

Prospectus – MARC ETTLINGER –December 9, 2005 post-lexical opacity and morphological opacity and B) that post-lexical opacity isn’t in the abstract grammar, per se, but rather is the result of the phonetic implementation of the grammar. In the first stage, prior to the learning of morphology, I suggest that opacity is only found as a result of post-lexical processes. Since children have an early awareness of word-boundaries (Saffran, Newport, & Aslin 1996), they must make a generalization regarding the flapping of /t/ in hit Ann [hn] (cf. hit [ht] s howing the hit is underlying /ht/) compared to the non-flapping in plant it [plt t] (cf. natty [ni] showing that flapping can occur after nasal vowels). While this can not be learned using a categorical learning approach like the biased constraint demotion algorithm of Prince & Tesar (1999) (section 4.1.2), the flapping generalization can be learned using a gradient approach, like that of the gradient learning algorithm (GLA; Boersma 1997; section 4.1.3). Expanding on the argument that variation does not belong in the grammar (Keller & Asudeh 2002), I suggest that while the learning process is gradient, the grammar is not. This can be formally represented by a pairing of gradient GLA with a standard, rather than stochastic, OT grammar (which the GAL presupposes). This approach is potentially corroborated by psycholinguistic evidence that shows that children regularize their input, even if it is irregular, into a completely categorical system (Hudson Kam & Newport 2005). The question of how post-lexical opacity can be occur in this system which would otherwise regularize the flapping of /t/ to [] in all cases, can be answered by incorporating ideas from an exemplar-based approach to the phonetics/phonology interface (Johnson 1997, Pierrehumbert 2001). Exemplar based approaches to phonology are rooted in experimental evidence that shows that detailed linguistic and para-linguistic information is stored by the hearer/learner as part of their phonological system. A speaker could have the linguistic category /t/ include one exemplar of when Aunt Doris said it extremely carefully in the word heat at the end of the sentence [t], another exemplar of when the speaker herself said it in extremely fast speech in the word butter, [] and so on. This part of the linguistic system seems a lot more amenable to variation – either conditioned or free – as opposed to the formal, abstract grammatical system suggested in Boersma (1997) and Antilla (1997). It also seems reasonable to suggest that exemplar-based phonology is a good way to represent the articulatory and perceptual manifestation of the categories of the abstract phonological system and that it is at this phonology/phonetics interface that post-lexical processes – including post-lexical opacity. Since flapping is post-lexical (it happens within and between words), the learner, when hearing [] in writer would categorize it as /t/ and not /d/ because the preceding vowel [i]. This is based not on the abstract grammar, which doesn’t yet have the vowel raising generalization, but simply on the stored exemplars, comparing the [i] sequence to [rit], [bit], *[t] etc. Categorized as /t/ instead of // the abstract grammar would have no difficulty in acquiring the generalization that diphthongs are raised prior to voiceless stops. Hypothesizing that the neutralization of /t/ and /d/ to [] is not phonological, but rather phonetic, suggests that there is some acoustic way of distinguishing the two (Barnes 2002), making this a testable hypothesis. 25

The Acquisition of Opacity The second stage of opacity, the opacity that arises with the acquisition of morphology, likely involves the effects of paradigm leveling and constraints that make morphemes in different morphological contexts sound the same. This may be part of the grammatical system, manifested as output-output constraint in the grammar (Benua 1997), or may possibly be an exemplar effect as well. This is an area for further investigation. This section goes through some of the arguments for a phonetic/phonological system of this sort. Section 4.1.3 justifies the use of a gradient learning algorithm (GLA) as the phonotactic learning system by comparing it to alternative systems for acquiring the abstract phonology for opacity. Section 4,2 discusses the short-comings of the GLA, most of which relate to the stochastic grammatical system associated with it, and not the learning algorithm itself. The main objection questions whether variation – conditioned and/or free – should be part of the abstract grammatical system. The alternative suggested in section 5 is that this information should be encoded in an exemplar-based model at the phonetics/phonology interface. The conclusion is that it is here where post-lexical opacity occurs and so the acquisition of this type of opacity is solved by the correct categorization of tokens via exemplar-based perception. 4.1. PHONOTACTIC LEARNING The goals of this prospectus fit within the broader aim of finding a phonological theory that also includes a learning process that correlates with human acquisition of language. A key component of this is a grammar and learning theory that can account for phonotactic learning, of which opacity is a part. There are a number of different hypotheses for best representing the way we learn phonotactics, including models of OT-constraint learning (Tesar & Smolensky 2000, Hayes 1999), neural netoworks (Bergen 2002; Jurafsky & Narayanan 1996 for syntax) and probabilistic and information-theory based approaches (Goldsmith 2002, Coleman and Pierrehumbert 1997). In the next section, I briefly discuss why an information-theory based approach alone is not adequate for this investigation, then I go on to evaluate the two major approaches to constraint ranking that have been proposed for OT. 4.1.1. PROBABILISTIC APPROACHES Probabilistic approaches to phonology assign a probability to each phonological unit, phonological units being single phones, combinations of two or more phones and presumably features, syllables, word boundaries or other traditional representations that have been shown to be relevant to the organization of grammar. The entirety of probabilities that assign the highest chance of occurrence for the observed output is chosen as the best grammar. For example, in English, one can assign some positive probability to the occurrence of #ph in the grammar (# =word boundary) while assigning the probability of #p to zero; this captures the generalization, albeit obliquely, that p’s are aspirated in English in initial position. Aside from the computational advantages, it also has the virtue of being able to capture gradient effects, something which psycholinguistic evidence has shown is a factor in grammaticality judgments (Bergen 2002, Boersma & Hayes 2001, Antilla 1997). It can capture the generalization that p is a much more likely onset than h in English even though both are grammatical, for example, by assigning the probability of σ[p higher than the probability of σ[h. No such effect can be incorporated into standard OT. However, it isn’t completely clear that probabilistic and information-theory based approaches alone can account for the full complexity of phonological systems without the benefit of a larger framework, like OT.

26

Prospectus – MARC ETTLINGER –December 9, 2005

First, no way of introducing functional constraints has been integrated into an information-theory based approach to phonology. Since the acquisition of the grammar is subject to the frequency of input forms alone, there are no implicational hierarchies, for example. If a language has some proportion of CCV syllables, an information based approach doesn’t say anything about that language’s inclusion or exclusion of CV syllables while OT correctly predicts that they should be allowed as well. There are no universal markedness constraints, either. A language is as likely to have a velar voiced fricative in its phonemic inventory as a bilabial voiceless stop, despite what we know about the aerodynamic voicing constraint (Ohala 1983). No language fits that description (Ladefoged & Maddieson 1983). Second, information-based theories currently don’t produce cross-linguistic typologies. The tendency in one given language to have CV syllables doesn’t correlate to the same tendency in another language since it simply arises from the frequency of that sequence in the input. Third, aside from corroborating the fact that adjacency is a key conditioning factor in phonological rules/constraints, that word-breaks are important and that segments are a useful phonological unit, information-based approaches don’t have much to say about other any of the other generalizations uncovered by phonologists. For example, no auto-segmental-based analyses have been done, no investigation into paradigm effects, nothing on natural classes or even that the distinction between consonants and vowels is meaningful; the list goes on. Finally, a probabilistic version of OT (discussed further below; Boersma 1997, Hammond 2004) can actually replicate an information-theory based approach by having the constraints be equivalent to all the phonotactic sequences in the IT grammar and assign the weights that correlate to their probabilities. Therefore, for the purposes of this investigation, OT will be used as the framework for capturing phonotactic generalizations at the phonological level. What follows is an investigation into the two major algorithms that have been suggested for learning constraint rankings in a grammar with the conclusion is that one of the two is far superior in its ability to handle opaque data. 4.1.2. BIASED CONSTRAINT DEMOTION (BCD; PRINCE & TESAR 1999) All of the major theories of OT learning are based on the constraint demotion algorithm of Tesar and Smolensky (1993) which posits a universal set of constraints that must be ranked based on input data. The ranking is done according to a process that starts with the initial constraint set and proceeds to rank and re-rank the constraints on each new data point encountered until the constraint set is fully ranked. There are a number of crucial assumptions that are made in BCD, many of which are the same as the assumptions made in the original formulation of OT . First, the learner already has the complete set of constraints available to her as part of her universal grammar. That isn’t to say that the constraints are all purely psychological in nature; OT constraints don’t just need to reflect a psychological grammatical system, but can also incorporate the physiological perceptual and articulatory constraints (Flemming 1995; Hayes, Kirchner & Steriade 2003).

27

The Acquisition of Opacity

Second, the learner assumes that the underlying representation is the same as the output. This mirrors the lexical optimization hypothesis (Prince & Smolensky 1993) that states that for nonalternating forms, a language will reorganize their inputs so that they match the outputs. Third, all current incarnations of constraint demotion algorithm begin with markedness constraints ranked higher than faithfulness constraints. This reflects the initial state of child production data (Gnanadesikan 1995, Smolensky 1996) and instantiates the conservative strategy that the child assumes that something is invalid in the language unless they’ve heard it. This assumption goes hand in hand with the previous assumption because if there is no prior bias in the initial constraint rankings and inputs equal outputs, then all grammars would end up with all the faithfulness constraints outranking the markedness constraints and no phonotactics captured in the grammar. These last two assumption may be counter-intuitive in that it doesn’t seem like any generalizations can be captured when the input equals the output. If markedness constraints are all ranked higher at the beginning, though, it allows the learner to acquire the phonotactic generalizations of the language by demoting the markedness constraints that aren’t active. For example, for simple syllable structure, the four relevant constraints are a constraint against codas (NOCODA), a constraint requiring onsets (ONSET), and two faithfulness constraints, one against epethesis (DEP) and one against deletion (MAX). In the grammar’s initial state, only CV sequences will be allowed, a prediction matched by early child language production (Gnanadesikan 1995). When the child encounters a CVC syllable this changes the ranking because assuming the input is /CVC/, the output of the grammar would have resulted in either a CV or CVCV output because markedness outranks faithfulness: (42) Initial grammar state for syllable structure /CVC/ NOCODA ONSET MAX-IO DEP-IO (a) CV * (b) CVCV * (c) CVC *! On hearing CVC and realizing that the grammar’s output doesn’t equal the output she heard, the learner demotes the constraint that caused the winner to lose low enough so that it can now win: (43) New grammar based on hearing CVC /CVC/ NOCODA ONSET MAX-IO DEP-IO NOCODA (a) CV *! (b) CVCV *! (c) CVC * * This proceeds until the all of the constraints are fully ranked.

28

Prospectus – MARC ETTLINGER –December 9, 2005 For a data set with opaque stimulus, the BCD – as well as all other similar algorithms – either fail to recognize that there is any generalization to be found (much like the student in the introduction of the prospectus) or be caught in what Pulleyblank & Turkel (1997) refer to as a trap. In a trap, the grammar is not able to reach stasis and alternates back and forth between two grammars. This can be demonstrated by observing the operation of the BCD on the opaque interaction in the dialect of English where the t isn’t flapped in plant it [pltt]. The relevant constraints here are *VTV and IDENT(SON, VOICE). The initial constraint ranking, based on the M»F bias would be: (44) Initial constraint ranking: *VTV » IDENT(SON, VOICE) The hearer then hears the form plant [plt] and assumes the UR /plt/: (45) Initial constraint ranking: /plt/ (a) plt

*VTV

IDENT(SON, VOICE)

This incurs no violations of the relevant constraints and the correct output is generated so no reranking of constraints is required. At some subsequent point, the learner hears plant it [pltt]. The phrase contains plant and so the underlying representation is /plt t/. The current grammar would flap the t so a reranking is required, moving *VTV below the relevant faithfulness constraint. (46) /plnt t/ (a) plt (b) pltt

*VTV *!

IDENT(SON, VOICE) *VTV * *

At this point, the flapping generalizationand the markedness of *VTV is lost – already an undesirable result. Now, if presented with an example like hit Ann [hin], the faithfulness constraint has to be reranked so that the t flaps. These constraints will alternate back and forth, over and over, depending on the most recently heard input with *VTV» IDENT(SON, VOICE) after hearing transparent data and IDENT(SON, VOICE) » *VTV for opaque data. (47) /ht n/

IDENT(SON, VOICE) *VTV IDENT(SON, VOICE)

29

The Acquisition of Opacity (a) (b)

hn htn

* *!

*

In sum, the set of constraint demotion algorithms fails to accommodate a language where the data has both transparent and opaque forms. 4.1.3. GRADUAL LEARNING ALGORITHM (GLA; BOERSMA 1997) The Gradual learning algorithm was introduced to account for a number of deficiencies in the constraint demotion algorithm as well as for a number of empirical phenomena previously not incorporated into OT. What differentiates the GLA from the BCD is the type of grammar it presupposes: instead of a standard OT grammar, the GLA assumes a stochastic grammar where constraints are not discreet, but are on a scale of constraint strictness. Each constraint is assigned a scalar value and when evaluated can be anywhere along a normal distribution curve from that value. For example, two constraints, C1 and C2, can be ranked as shown in figure 1:

Figure 1: Overlapping ranking distribution (from Boersma & Hayes 2000)

In this case, the ranking will favor C1»C2 most of the time, but the variability of each constraint will result in the opposite ranking, C2»C1 some fraction of the time. This provides the advantage of being able to account for free-variation in speech. In Norwich English, Trudgill (1972) showed that there was variation in the pronunciation of final -ing as either a normal [] or as a syllabified n [n]. The study discussed variation by class but there variation was also found within a single speaker’s grammar. In one speaker, the variation was 72% [], 28% [n]. In stochastic OT, this is accounted for by ranking two constraints as shown in figure 1. If C1 is a constraint against deletion and C2 a constraint against unstressed lax vowels in final syllables, and if the constraints have a distribution of standard deviation of 2, then assigning C1 and constraint strictness ranking of about 82 and C2 a constraint strictness ranking of approximately 84.2 will give C1»C2 ~72% of the time and C2»C1 28% of the time, approximating the free variation of this particular speaker. When there isn’t free variation in the ranking between two constraints, as in the case of strict ranking, then C1 and C2 can be sufficiently spaced apart such that the chance for overlap becomes vanishingly small. This isn’t the aspect of the GLA that is useful for opacity; opacity isn’t an instance of free variation and is a strictly deterministic phenomenon as governed by the phonology. What is useful is the constraint ranking algorithm associated with it and it’s capacity for dealing with erroneous or exceptional data. 30

Prospectus – MARC ETTLINGER –December 9, 2005

The ranking algorithm is similar to the BCD except that instead of completely re-ranking the relevant constraints on encountering output that requires a re-ranking, the constraints are instead adjusted by some incremental amount. When enough tokens of data are encountered, the adjustments will accumulate enough to create a categorical distribution of the rankings. This approach is useful for opaque relationships when there are enough transparent tokens to outweigh the opaque ones. For every opaque token encountered that inches down the constraint maintaining the [t] on the surface (49), there ideally would be multiple transparent token inching the *VTV constraint even faster (48) (48) /ht n/ (a) (b)

*VTV IDENT(SON, VOICE) * hn * htn

(49) /plt t/ (a) plt (b) pltt

*VTV IDENT(SON, VOICE) * *

This approach makes a number of interesting predictions. First, it suggests that tokens of transparent relationships will far exceed tokens of the correlating opaque relationships in the data.15 Second, it suggests that the constraint in question will not be as categorical as completely transparent constraints, leading to either some small degree of variation, some phonetic manifestation, or effects on grammaticality judgments. 4.2. VARIATION IN GRAMMAR? Antilla (1997), following up on the idea that non-crucial constraint ranking correlates to free variation (Kiparsky 1993, Reynolds 1994, Kager 1994), found some striking evidence in support of this claim from Finnish genitive plurals. He started by assuming that constraints were ranked freely when not crucially ranked and that free variation in pronunciation was the result of this variable ranking of constraints. Then, by simply calculating the ratio of the number of rankings that produced each variant, he obtained frequency expectations that matched the data extremely well. For example, the genitive plural for the stem korjaamo ‘repair shop’ has two possible candidates: kor.jaa.mo.jen and kor.jaa.-moi.den. To obtain predictions on variation, Antilla arranged the constraints for the genitive plural into five strata with strict ranking between strata, but free ranking within a stratum. The two genitive plural candidates have equal numbers of violations 15

I believe this corroborates the claim the opaque relationships are unstable and marked and transparent relationships are unmarked and more regular. I’d have to work through the details, but I think that if opaque tokens outnumber transparent tokens, then some interesting grammatical re-organization is likely to take place.

31

The Acquisition of Opacity for all constraints in the top three strata so the outcome was determined by constraints in the fourth stratum. There, kor.jaa.mo.jen had more violations against consecutive unstressed syllables than the alternative while kor.jaa.moi.den has more violations of the constraint against two consecutive heavy syllables, stressed heavy syllables, and two of other constraints. If the constraint violation against consecutive unstressed syllables is on top of the other four, then kor.jaa.moi.den wins; if any of the other four constraints is on top, then kor.jaa.mo.jen wins. This predicts that kor.jaa.moi.den is the output 20% of the time (one out of five rankings) and kor.jaa.mo.jen 80% (4 out of 5). These values match the attested values in Anttila’s corpus of 17.8% and 82.2% extremely well. Stochastic OT operates on a similar principle to generate free variation based on alternations in constraint ranking as shown above and have achieved similarly accurate result. Despite this empirical success, however, it is a dubious proposition to include variation as a part of the OT grammar. First, it’s questionable whether variation is ever truly “free”. Sociolinguists have provided volumes of evidence showing that variation is conditioned by myriad social factors. For example, Labov (1966) shows that r deletion in the New Yorker dialect of English was correlated to the speaker’s social class and speech style and wasn’t deleted at a constant rate across the speech population; Bergen (2002) shows that a number of factors including gender, age, orthographic representation of the word in question and phonological effects all interact to condition French liason. Even within a single speaker and holding social factors constant, other factors like speech style, speech rate, mental state, degree of soberness and environment all play a part in what acoustic signal comes out of the mouth of a speaker. In light of the non-free nature of variation, it’s unclear what Antilla’s results actually mean beyond, perhaps, coincidence. In the case of stochastic OT, the accuracy is simply an exercise in data fitting exhibiting some amount of correlation between variables. Second, if the intent is to include all functional constraints on speech as part of the “grammar” in order to truly be able to predict frequencies, then the universal constraint set should include those factors listed above – a *RUDE for example.16 This would reek havoc on factorial typologies since, as Keller & Asudeh (2002) point out, the typology of languages would need to reflect the possible ranking of, say, individuals’ memory constraints. As they also point out, if performance and variation are part of the OT grammar, then it would create problems for lexical optimization as well. If input matches output and a different output is generated by each constraint ranking as effected by performance factors, then a separate input is needs for each realization of a morpheme and its pronunciation in fast speech, careful speech, etc. This would result in a specious proliferation of lexical items. If variation isn’t part of the OT grammar, then where is it? In the next section, I suggest that an exemplar-based approach is better suited towards capturing this information. In addition, since post-lexical effects are subject to conditioning by speech rate and speech style and other 16

In many ways, similar arguments can be made on the role of synchronic phonetics in phonology, a position argued against in Blevins (2004) and Barnes (2002) inter alia.

32

Prospectus – MARC ETTLINGER –December 9, 2005 functional considerations, I suggest that post-lexical opacity should be treated the same way, eliminating opacity from the OT grammar. 4.3. SUMMARY To summarize where we’ve been in this section: I started by showing that between the GLA and BCD, the two main options of learning algorithms for OT constraints, only the GLA is viable for learning opacity; the BCD would be trapped in a continuous loop, ranking and re-ranking the two relevant constraints for the opacity depending on the most recently heard word. The GLA, however, is generally associated with a grammatical model, stochastic OT, that uses the variable, stochastic ranking of constraints to account for things like variation. This is not an appropriate function for the OT grammar, however. Instead, the GLA should be paired with a standard OT grammar. The two can fit together by eliminating the variability of the ranking of each constraint – its normal distribution around its ranking – and make the rankings categorical. The still need to be quantified relative to each other so that they can be incrementally changed by the GLA. The result is a system that takes gradient, stochastic input and creates a categorical system. This is not a novel idea as there is strong evidence in the psycholinguistic literature that this is, in fact, what children do with variable input. For example, Hudson Kam & Newport (2005) showed that children (6;4.10) regularized the use of the determiner in an artificial language learning experiment, even when it was used inconsistently in the input. So, whereas adults learn the grammar veridically, producing the same percentage of determiners as was in the input, children learn over inconsistent data in a way that is useful for learning opacity and correlates with a GLA learning algorithm associated with a strict-ranking grammar. An interesting prediction this makes is that adult second-language learners should not be able to learn the opaque relationship categorically and should make “mistakes” according to the ratio of opaque to transparent tokens in their input. 5. CATEGORY THEORY AND EXEMPLAR-BASED MODELS The importance of the distinctive feature to phonological theory can not be over stated. Witness statements like that of Robins (1977): “The distinctive feature has proved to be one of the most important theoretical concepts of out science during the past half-century,” or Hockett’s assertion, in his 1964 LSA presidential address, that one of the four major breakthroughs in the history of linguistics was the quantizing hypothesis where every consonant and vowel sound of every language was made up of a finite set of phonological units. Recently, however, linguists have begun to consider alternate methods of representation. In this section, I suggest that using an exemplar-based representation of the phoneme can account for the post-lexical opacity discussed above. 5.1. BACKGROUND If we conceive of every phoneme as a category, then the approach of using a set of features to define phonemes is rooted in a tradition dating back to Aristotle that went unchallenged until the second half of the 20th century. This classical view holds that all instances of a category have some fundamental characteristics in common that warrant their membership I nthe category in

33

The Acquisition of Opacity question. It takes as the mental representation of categories lists of features that are individually necessary for membership in the category and that are collectively sufficient to determine category membership (Medin 1989). For example, the category triangle can be defined as A) a closed geometric form B) having three sides and C) three interior angles that sum up to 180 degrees. Similarly, a t can be defined as A) coronal B) not sonorant C) voiceless D) not continuant and E) anterior. These features differentiate t from  and so the category of an input phone can be ascertained by determining the status of each of these features. The rejection of this view in favor of other models started with Wittgenstein seminal work in 1953 challenging both the idea that category membership was clearly delineated and the idea that a set of necessary and sufficient conditions was the appropriate way to define a category. Since Wittgenstein, a number of alternate theories of categorization have been proposed including fuzzy set theory (Zadeh 1965), prototype theory (Rosch & Mervis 1975), exemplar theory (Smith & Medin 1981) and theory-based conceptions of categories (Murphy & Medin 1985, Lakoff 1987). The recent proposals are based on a number of interesting empirical findings regarding human categorization. First, feature-based approaches fail to account for the fact that people judge some members of a category as better examples than others, such as judging a robin as better example of a bird than an ostrich. A similar effect is likely to be found in phonemes categories as well. [t] and [th] are both manifestation of the same category and share the distinctive features, but [th] would probably be judged as a better example of /t/ than [t] for an English speaker. Second, the classical view can not account for unclear cases and contested categories (Gallie 1956). In the classical theory, a rose is a rose is a rose but in human categorization, there are disagreements over category membership depending on the person or the situation. Is cheese a desert or an hors d'oeuvre? How tall does a bush have to be to become a tree? In the realm of linguistics this would be manifested in something like different people using different values of voicing onset time to differentiate a /d/ and /t/. This would presumably be conditioned by the listener and the linguistic and para-linguistic context. While classical theory can not account for these effects, both prototype and exemplar-based theories can. Prototype theory uses a prototype to determine category membership and is a summary representation of all of the members of the category a person has experienced. It is either an example or an ideal that possess all of the characteristic features of the category. The category is formed based on experience with members of the category and people then abstract out the central tendencies that becomes the mental representation of the category. The exemplar view denies that there is a single summary representation of the category and instead claims that categories are represented by the collection of all the examples, all stored in the brain. Therefore, a tallish bush is judged a tree based on its similarity to all the other trees the perceiver had see prior. Prototype theory and exemplar theory are similar in that they judge category membership by their assessing the similarity of a token to either one prototype or to a collection of exemplars. 34

Prospectus – MARC ETTLINGER –December 9, 2005 One way to judge similarity is in terms of shared and overlapping features, so the assessment of the category membership of a d can be compared to prototypical t’s and d’s in terms of voice onset time which corresponds to the voicing feature. The two hypotheses differ in that exemplarbased approaches retain all of the details of each exemplar which a prototype loses. Psychological experimentation into people’s sensitivity to things like category size and particular details about experienced exemplars therefore point to exemplar-bsed approaches as being superior in this respect. Despite these successes, there are further empirical facts that present challenges for category theoreticians. One problem is that neither prototypes nor exemplars seems well suited to account for conceptual combination – that a pet bird is neither a prototypical pet nor a prototypical bird. The type of combination is a dynamic processes, so in most instances, it isn’t reasonable to assume every ADJ-NOUN combination has its own prototype. Within linguistics, this can correlate to the question of whether there is an exemplar or prototype for every phone and each context it can appear in. Assuming not, it isn’t clear how the mind captures what a [t] sounds like adjacent to an [n], to an [o] and so on, particularly if there’s been no exemplar for it. Similarly, prototypes and exemplars don’t take context into account very well. My prototype for a drink will differ based on whether I’m in a bar or just ran a marathon. Likewise, the category for [t] will also be subject not only to local phonological factors like the adjacent segments, but factors such as accent, speech style and physical environment as well that all bear on how an acoustic signal is categorized. Another final question relates to how similarity is determined. A cat and a rat share innumerable features, including being animate, solid, existing on our planet and so on, and yet a select few are chosen by humans as important differentiating factors. In phonetics, it isn’t clear where the organizing principles of the categories come from, though there has been some interesting research relating to how physiology and self-organizing principles play a part (Mielke 2004; Wedel 2004). So while impressive strides have been made in category theory the past few decades, there are a still a number of outstanding questions. It is, however, pretty clear that while classical categories may be useful in artificial systems, they aren’t well suited to systems of the mind. 5.2. EXEMPLARS IN LINGUISTICS Linguists have also been contributing to this body of research, questioning the classically-rooted feature-based representation of sound in favor of the alternate ways of representing categories discussed above. In a linguistic exemplar model, each category is represented by a collection of all of the tokens of that category. In addition to being labeled, or identified as a member of that category, // for example, each token carries with it information about it’s phonological context (in the word bet), the speaker and any other information the hearer is able to abstract. Thus, the category // is said to consist of tokens with varying formant values each with associations to other categories like /bt/ and Dad’s voice. When a new token is heard, it is categorized based on comparing every aspect of the input to the token in memory. Thus if the hearer hears their father say it, it will activate the Dad’s voice

35

The Acquisition of Opacity exemplars and the input formant values are compared to those stored that are closest to it. One the category is determined, the exemplar becomes another member of the collection for the category. The experimental paradigms showing this generally revolve around showing that hearers are able to recall or sensitive to detailed speaker information. For example, Mullenix, Pisoni & Martin (1989) found that speaker variability (using different speakers for each word) slowed down word identification as compared to keeping the speaker the same. Similarly, hearers could recall the particular voice that uttered a word surprisingly well (Schacter & Church 1992). This suggests that at some level of representation, a detailed acoustic signal is captured and remembered as an exemplar. This contrasts with the speaker normalization hypothesis, which suggests that phonetically irrelevant details are filtered out and compared to a prototype or classical category defined by features. Detailed phonetic knowledge also comes into play in a number of other areas. Speakers are able to differentiation the phonemes of their language or dialect from others based on subtle phonetic cutes on phonemes that are otherwise the same set of features (Bradlow 1995). Also, Bybee (Bybee 200, Hooper 1976) has shown that schwa deletion correlated with word frequency suggesting the phonological information is not purely abstracted to the level of the phoneme. These results present a challenge to phonological theory in a number of ways. First, the lexicon is supposed to be separate from the phonological system and the phonological system is supposed to operate consistently across all inputs. This is not the case if different words show different rates of schwa deletion. Also, the phonological system does not encode things like frequency – something that clearly affects the phonetic output. Finally, the subtle cross-linguistic differences in phoneme quality challenges the idea that there is a universal set of features corresponding to a universal set of motor-sequences defining the phonemic inventory of language. While exemplar-based models can successfully account for the above experimental data it’s less clear how an entire phonological system would be organized around such an approach. At the phonological level, exemplar models have been used to account for the emergence of the categorical behavior of phonemes (Wedel 2004, Mielke 2004) and on certain use-based effects like neutralization, lenition and entrenchment (Pierrehumbert 2001), but as of yet, no model capturing the palatalization of fricatives before high vowels, for example, has been proposed. The suggestion here is that through the process of phonologization (Hyman 1977), these exemplar categories are inputs into a phonological system, like that of OT. The interface is at the level of the phonemes that make up the input and output of the phonological system so that a [t] in the output corresponds to the exemplar category of [t]; the same holds of underlying /t/ as well. 5.3. EXEMPLAR EFFECTS AS POST-LEXICAL PHENOMENA There is significant evidence that the word is a psychological real entity – more so than the underlying representation or the surface representation (Schane 1971). Many of the arguments

36

Prospectus – MARC ETTLINGER –December 9, 2005 are the same as the justification for lexical phonology and a separate post-lexical level (Kiparsky 1982): The surface representation is too detailed in light of speaker intuitions that t~d are different in words like atom and Adam despite their neutralization to a flap intervocalically or the lack of awareness of the different allophones of stops depending on whether they are syllable initial or not ([spn] vs. [phn]). On the other hand, the underlying representations seems too abstract in light of evidence from word games and that the pronunciation of the plural suffix is understood to vary and is not considered to be the same sound. Given that exemplars reflect word-level information, as mentioned above in the work of Bybee, it is therefore conceivable that post-lexical effects should be modeled with exemplars. There is also evidence that post-lexical effects like flapping are most likely to interact and be effected by other factors such as speech style and speed (Hammond 1990). Bergen (2002) also showed that French liason, another post-lexical process, is not purely phonological, but is also conditioned by factors like orthography, speaker sex and class, syntactic categories and frequency – all information that is captured in exemplars, but not in the phonological system. For instance, in the case of writer in the dialect discussed above, the output of the abstract phonological component of the grammar is |rit|17. Each segment represents a production target for the exemplar system which takes the category the segment represents and looks for exemplars that most closely match the linguistic and para-linguistic environment. In the case of rapid, or natural speech most of the intervocalic exemplars in the speaker’s |t| category have the phonetic characteristics of a flap. Therefore, the word is articulated as [r]. In perception, the process is reversed. The hearer hears [ri] and must categorize each segment. The [] can potentially belong to two categories in the phonology: the |t| or |d|. Based on the local environment, in this case the crucial factor is that it’s after an [i], the hearer categorizes the flap as a |t|. At this point, from the phonological system hasn’t yet come into play – only the basic categorization process. In acquisition, a similar process allows for the acquisition of opacity – if the flap is simply categorized at a |t| then the input to the phonological learning algorithm is |rit|, a transparent input in terms of acquiring environment of voiceless obstruents for diphthong raising. 5.4. SUMMARY This section provided an overview of how an exemplar-based theory of categorization can be integrated with an abstract formal phonology like OT. The move from a feature-based approach to defining categories is certain a positive one and is supported by recent developments in cognitive psychology and speech perception. The manifestation of phonological categories as the selection of exemplar can be taken to represents what is termed phonetics/phonology interface where both linguistic and para-linguistic factors effect the articulation and perception of words and segments. Many details need to be worked out and in particular, the proposal that the postlexical component of the grammar is captured by exemplar-based categorization need to be put to empirical testing, similar to Hammond (1990) which that flapping is conditioned not only by the phonology, but by speech style or Bergen (2002) which showed that French liason is subject 17

Bars are used to indicate the pre-post-lexical output.

37

The Acquisition of Opacity to myriad factors beyond ones that are phonological. If it is indeed the case, however, then it frees the abstract grammar from having to account for opacity triggered by post-lexical processes and instead, this type of opacity is defined as categorization problem. 6. SUMMARY In this prospectus, I attempted to establish a theoretical framework for integrating exemplarbased approaches to linguistic categorization into a formal theory of phonology like OT. Integrating the two opens up the possibility of accounting for one of the trickier problems facing Optimality theory: that of opacity. The first half of this prospectus established that all of the current ways of accounting for opacity are flawed empirically and/or because they are unlearnable. The suggested approach makes a number of testable predictions: i. Post-lexical opacity should leave traces of the underlying phonemic categories, measurable through careful phonetics analysis ii. A truly opaque grammar should be unlearnable and will either be regularized or the generalization will be lost iii. Langauge learners should be able to correlate the phonetic output with the appropriate phonological category given the appropriate environment, even if the phonetic detail renders it ambiguous. A lot more work obviously needs to be done on the empirical side, as well; the hypothesis that all opacity derives from post-lexical phonetic implementation effects or from morphological paradigm leveling is also a string claim that ultimately should be rooted in data exploration.

References: Albright, Adam, and Hayes, Bruce. 2002. Modeling English past tense intuitions with minimal generalization. Paper presented at Proceedings of the Sixth Meeting of the ACL Special Interest Group in Computational Phonology. Alderete, John, Brosaneavu, Adrian, Merchant, Nazarre, Prince, Alan, and Tesar, Bruce. to appear. Contrast Analysis Aids the Learning of Phonological Underlying Forms. Paper presented at The Proceedings of WCCFL 24. Anderson, John R. 1990. The adaptive character of thought: Studies in cognition. Hillsdale, N.J.: L. Erlbaum Associates. Anttila, Arto. 1997. Deriving Variation from Grammar. In Frans Hinskens, Roeland van Hout, and W. Leo Wetzels (eds.) Variation, Change, and Phonological Theory. Amsterdam: John Benjamins: 35-68. Archangeli, Diana B., and Pulleyblank, Douglas George. 1994. Grounded phonology. Cambridge, Mass.: MIT Press.

38

Prospectus – MARC ETTLINGER –December 9, 2005 Benua, L. (1997). Transderivational identity: phonological relations between words. PhD dissertation, University of Massachusetts, Amherst. Barnes, Jonathan. 2002. Positional Neutralization: A Phonologization Approach to Typological Patterns, PhD Linguistics, University of California. Bergen, Benjamin K. 2001. Of sound, mind and body: Neural explanations for non-categorical phonology, Linguistics, University of California. Bermúdez-Otero, R. 1999. Constraint interaction in language change [Opacity and globality in phonological change]. PhD dissertation, University of Manchester / Universidad de Santiago de Compostela. Bermúdez-Otero, Ricardo. 2003. The acquisition of phonological opacity. In Variation within Optimality Theory: Proceedings of the Stockholm Workshop on `Variation within Optimality Theory´, ed. Anders Eriksson & Östen Dahl Jennifer Spenader, 25-36. Stockholm: Department of Linguistics, Stockholm University. Blevins, J. 2004. Evolutionary phonology: The emergence of sound patterns. Cambridge: Cambridge University Press. Boersma, Paul, and Hayes, Bruce. 2001. Empirical Tests of the Gradual Learning Algorithm. Linguistic Inquiry 32:45-86. Boersma, Paul, Esudero, Paola, and Hayes, Rachel. 2003. Learning Abstract Phonological from Auditory Phonetic Categories: An Integrated Model for the Acquisition of Language-Specific Sound Categories [August 3-9, 2003]. Proceedings of the 15th International Congress of Phonetic Sciences:1013-1016. Booij, Geert 1997 Non-derivationphonology meets lexical phonology. In I. Roca (ed.) Derivations and constraints in phonology. Oxford: Clarendon Press. Bybee, Joan (2001). Phonology and Language Use. Cambridge: Cambridge University Press. Chomsky, Noam, and Morris Halle. 1968.The sound pattern of English. New York: Harper and Row. xiv, 470 pages. Reprinted 1991, Boston: MIT Press. Cole, J. & CW Kisseberth. 1994. “An Optimal Domains theory of harmony.” Studies in the Linguistic Sciences. 24:2, 101-114. Coleman, John, and Pierrehumbert, Janet. 1997. Stochastic phonological grammars and acceptability. Paper presented at Computational Phonology: Third meeting of the ACL special interest group in computational phonology. Donegan, Patricia J., and David Stampe (1979). The study of natural phonology. In D. A. Dinnsen (ed.) Current Approaches to Phonological Theory. Bloomington, IN: Indiana University Press. 126173.

39

The Acquisition of Opacity Dresher, Elan. 1981. On the learnability of abstract phonology. In The Logical Problem of Language Acquisition, ed. C. L. Baker and J. J. McCarthy, 188-210. Cambridge, MA: MIT Press. Ferguson, Charles A. and Dan Isaak Slobin (eds.) 1973. Studies of Child Language Development. New York: Holt, Reinhart and Winston, Inc. Flemming, Edward. 1995. Auditory representations in phonology. Doctoral Dissertation, UCLA. Gafos, Adamantios I. 1999. The articulatory basis of locality in phonology: Outstanding dissertations in linguistics. New York: Garland. Gnanadesikan, Amalia. 1995. Markedness and Faithfulness in constraints on child phonology. Ms., ROA#75. Goldsmith, John A. 1993. The Last phonological rule: reflections on constraints and derivations: Studies in contemporary linguistics. Chicago: University of Chicago Press. Goldsmith, John. 2001. Unsupervised Learning of the Morphology of a Natural Language. Computational Linguistics. 153-189. Goldwater, Sharon, and Johnson, Mark. 2003. Learning OT Constraint Rankings Using a Maximum Entropy Model. Paper presented at Workshop on Variation within Optimality Theory, Stockholm University. Hammond, Michael. 2004. Gradience, Phonotactics, and the Lexicon in English Phonology. International Journal of English Studies 4:1-24. Hayes, Bruce. 1999. Phonological Acquisition in Optimality Theory: The Early Stages. In Fixing Priorities: Constraints in Phonological Acquisition, ed. J. Pater R. Kager, W. Zonneveld: Cambridge University Press. Hayes, B., Robert Kirchner and Donca Steriade (eds.) Phonetically Based Phonology, Cambridge University Press Hudson Kam, Carla, and Newport, Elissa. 2005. Regularizing Unpredictable Variation: The Roles of Adult and Child Learners in Language Formation and Change. LANGUAGE LEARNING AND DEVELOPMENT 1:151-195. Hyman, Larry 1977. Phonologization. In: Linguistic Studies Offered to Joseph Greenberg, A. Juilland (ed.), 407–418. Saratoga: Alma Libri. Hyman, Larry M., and Kisseberth, Charles W. 1998. Theoretical aspects of Bantu tone: CSLI lecture notes; no. 82. Stanford, Calif.: CSLI Publications, Center for the Study of Language and Information. Inkelas, Sharon. 1998. Phonotactic blocking through structural immunity. In Lexicon in focus, ed. B. Stiebels & D. Wunderlich, 7-40. Berlin: Akademie Verlag. Ito, Junko, and Mester, Armin. 2003. On the sources of opacity in OT: Coda processes in German. In The Syllable in Optimality Theory, eds. C. Fery and R. v.d. Vijver: Cambridge University Press.

40

Prospectus – MARC ETTLINGER –December 9, 2005 Johnson, Keith, and Mullennix, John W. 1997. Talker Variability in Speech Processing. San Diego, CA: Academic Press. Kavitskaya, D. 2002. Compensatory Lengthening: phonetics, phonology, diachrony. Doctoral dissertation. University of California, Berkeley. Kager, René. 1999. Optimality Theory: Cambridge University Press. Keller, Frank, and Asudeh, Ash. 2002. Probabilistic Learning Algorithms and Optimality Theory. Linguistic Inquiry 33:225-244. Kenstowicz, Michael, and Kisseberth, Charles W. 1977. Topics in Phonological Theory. New York: Academic Press. Kiparsky, Paul. 1968. Linguistics Universals and Linguistic Change. In Universals in Linguistic Theory, eds. Emmon Bach and Robert T. Harms, 170-210. New York: Holt, Rinehart & Winston. Kiparsky, Paul. 1972. Explanation in Phonology. In Goals of Linguistic Theory, ed. Stanley Peters, 189225. Englewood Cliffs, N.J.: Prentice-Hall Inc. Kiparsky, Paul. 1973. Phonological Representations. In Three Dimensions of Linguistic Theory, ed. Osamu Fujimura, 1-136. Tokyo: TEC Company. Kiparsky, Paul. 2000. Opacity and cyclicity. The Linguistic Review 17: 351-366. Krämer, Martin. 2003. Vowel harmony and correspondence theory: Studies in generative grammar; 66. Berlin; New York: Mouton de Gruyter. Labov, William. 1966. The Social Stratification of English in New York City. Washington DC: Center for Applied Linguistics. Lakoff, George. 1987. Women, fire, and dangerous things: what categories reveal about the mind. Chicago: University of Chicago Press. Lamberts, Koen, and Shanks, David R. 1997. Knowledge, concepts and categories. Cambridge, Mass.: MIT Press. Levi, Susannah V. (2000) Modern Hebrew: A challenge for sympathy. University of Washington Working Papers in Linguistics, 20. 1-14. and ROA # 758-0705 MacWhinney, B. (2000). The CHILDES project (3rd Edition). Mahwah, NJ: Lawrence Erlbaum. McCarthy, John J. 2003. Comparative Markedness. Theoretical Linguistics 29:1-51. Sanders, Nathan. 2003. Opacity and Sound Change in the Polish Lexicon, Linguistics, UCSC. Seidl, Amanda, and Buckley, Eugene. 2005. On the Learning of Arbitrary Phonological Rules. LANGUAGE LEARNING AND DEVELOPMENT 1:289-316. Smolensky, Paul. 1996. "On the comprehension/production dilemma in child language." Linguistic Inquiry 27, 720-31.

41

The Acquisition of Opacity Tesar, Bruce, and Smolensky, Paul. 2000. Learnability in optimality theory. Cambridge, Mass.: MIT Press. Tesar, Bruce, and Smolensky, Paul. 2004. Using phonotactics to learn phonological alternations. Paper presented at CLS 39, Vol. II: The Panels., Chicago. Walker, Rachel. 2000. Nasalization, neutral segments, and opacity effects: Outstanding dissertations in linguistics. New York: Garland. Wedel, Andrew. 2004. Category competition drives contrast maintenance within an exemplar-based production/perception loop. Paper presented at Workshop of the ACL Special Interest Group on Computational Phonology, Barcelona.

42

Suggest Documents