gives a responsible and predictive account of the typological possibilities of .... has come from the psychology literature on categorization, and which has been ...
Optimality Theory and Exemplar Theory Jeroen van de Weijer Leiden University ABSTRACT In this paper I identify a number of problematic areas within “standard” Optimality Theory. In each of these areas active research is under way and progress is being made. For possible solutions to some of the problems currently facing phonological theory, I draw attention to the psycholinguistic framework of Exemplar Theory. A preliminary proposal as to how Optimality Theory and Exemplar Theory might be combined to provide a comprehensive, psycholinguistically realistic theory of linguistics is presented. Keywords: Optimality Theory, Exemplar Theory, phonological theory, psycholinguistics
1. Introduction In a recent article, Jackendoff (2007) outlines six core problems for linguistics in the 21st century. Among these is the topic of this paper, viz. the need to integrate linguistic theory with psycholinguistics, and also to relate linguistic knowledge to the general cognitive capabilities of man. Only by integrating adequate theories within these two realms will it be possible to arrive at a general, unified theory of human language, which (i) takes seriously the autonomy of the single speaker and his native competence (encompassing both perception and production), (ii) which provides an adequate model of the interaction between speakers (which involves the notion of harmony, or cooperation in communication) and (iii) which gives a responsible and predictive account of the typological possibilities of variation between languages. In short, both the “microlevel” (the individual speaker), communication between speakers, and the “macrolevel” (variation between languages) should be accounted for. In this paper I will first outline some challenges that are currently being faced within what is arguably the most promising and attractive theory in theoretical phonology, i.e. Optimality Theory (OT; Prince & Smolensky 1993/2004). A number of issues will be outlined, viz. psycholinguistic credibility, the role of production and perception and the way in which OT deals with variation and frequency effects. All these topics could be seen as problematic in classical OT, and have been tackled in various ways in ongoing research. Then we will focus on a particularly attractive theory of psycholinguistic, viz. Exemplar Theory (ET), as explored, for instance, in recent work by Bybee (2006). ET will be seen to provide a number of solutions for the problems faced by OT, although it is not a perfect theory itself in a number of ways, especially with respect to the areas of communication and typology mentioned above. In the final part of the paper I will outline the rough contours of a combined theory of OT and ET, which retains the advantages of both approaches but which is not liable to the deficiencies of either. 2. OT and ET In this section I will examine the advantages and disadvantages of two prominent
theories of linguistics that bear on the topics discussed above. These are Optimality Theory, a relatively abstract theory, and Exemplar Theory, which is a usage-based theory with roots in psychology. 2.1. OT First, let us outline some advantages and areas of possible improvement in OT, focusing not so much on the analyses of isolated phenomena in individual languages, where I think it is fair to say that OT has been extremely influential, despite remaining problems like opacity (see McCarthy (2006, 2008) for a possible new approach). A first important strength of OT is, in my view, its modelling of the situation of communication. It may be surprising to view OT as modelling (human) communication, so it is appropriate to enlarge this point. Communication, simply put, involves putting across a meaning between a speaker and a hearer. The speakers’ meaning needs to be expressed, either in words, or in signs (as in sign language), and after the message has been conveyed through some medium it needs to be perceived and understood. It is important to realize that speakers and hearers, while sharing the same communicative goal, have different interests (see also van de Weijer (2007a,b)). Speakers will try to put their message across using a minimal amount of effort. They will assimilate between segments, elide segments, mistime between articulatory gestures, and simplify syllable structure. These tendencies, which are based on simple laws of phonetics, are expressed by markedness constraints in OT, which may all be viewed as particular instantiations of the LAZY constraint family of Kirchner (1998). At the same time, this presents the reason why markedness constraints should be grounded in phonetics. Compared to the speaker, the hearer/listener has opposite interests. The hearer’s job is word recognition. It is beneficial to the listener if words have not been distorted by elisions, assimilations, and if gestural articulations are perfectly timed. Ideally, from the perspective of the hearer, there should be a direct match or mapping from what she hears to whatever forms she has in her lexicon. In OT, this demand, which is clearly based on word recognition, is expressed by faithfulness constraints, which favour as little diversion from input forms as possible. Note that we derive the OT premise that the markedness and faithfulness constraints are universal: articulatory systems do not differ much between speakers around the globe, and word recognition skills are also basic cognitive human skills. Thus, speakers and hearers have the same interest –communication– but approach this from different angles. The “mediation” system between them is the grammar. Different languages will select different solutions to resolve the conflict and OT will reflect this by ordering constraints differently in different languages, and also across different styles of speech. It is important to note that both the interests of the speaker and those of the hearer are reflected in OT grammars. A second advantage of OT lies in the realm of typology. Different constraint hierarchies define different languages, so there is a direct and, in principle, verifiable prediction as to which languages are possible and which are not. By way of the “factorial typology” OT defines the bounds of human languages, in a way that rule-based theory could never achieve.
It is up to linguists to find languages that match these hierarchies, keeping in mind that constraints should be well-motivated from a phonetic and psycholinguistic perspective. Factorial typologies play an important role in work on stress systems, reduplication systems, patterns of voicing assimilation, etc. In recent work, Coetzee (2008) shows that even constraints that are low in the hierarchy, and therefore do not normally play a role in the language, can be proven to exist by psycholinguistic experiment: such constraints are shown to function in the relative acceptability of non-words. Together with the factorial typology this is strong confirmation for the OT position that constraints are shared across languages. A third, related, advantage is that OT offers a very straightforward “production mechanism”, in the sense that there is a clear, explicit strategy as to how an underlying form (“input”) is mapped onto a winning candidate (“output”): as is well-known, the candidate that satisfies the highest-ranked constraints is victorious over other candidates, regardless of the number of violation marks and regardless of the score on lower-ranked constraints. This production mechanism is also maximally transparent in the sense that it does not assume any number of intermediate, abstract levels (at least in the classical version of the theory). We thus count three important areas of success in OT. Let us turn to the challenges which classical OT faces and which have been addressed in recent work. First, connected to the view that OT basically offers a production mechanism, it is not so clear how OT deals with perception and/or word recognition at all. It is clear that perception plays a role (in the faithfulness constraints, cf. above), but there is no separate explicit “perception/recognition” module, i.e. a mechanism or strategy that says how to match an incoming phonetic form (“output”) to a possible underlying form (“input”). This has been a subject of active investigation and partially repaired in recent models, e.g. in Paul Boersma’s Functional Phonology model (Boersma 1998) or his more recent models, such as Stochastic OT (Boersma (2007) and references there). These models have successfully been tested against complex acquisitional data and intricate patterns of variation. Still, some remarks are in order. In (1) I present the model of grammar taken from Boersma (2007: 2032). (1) | Underlying Form | faithfulness constraints /Surface Form/
structural constraints cue constraints
[Auditory Form]
auditory constraints sensorimotor constraints
[Articulatory Form]
articulatory constraints
In this model there are four separate levels and six different types of constraints relating them. The “surface form” is still a quite abstract form, and perhaps it is possible to do without the distinction between articulatory form and auditory form. One may wonder whether it might not be more restrictive to preserve the original idea of OT, with two levels (underlying and surface) and with two types of constraints: markedness and faithfulness (or rather: speakeroriented constraints and hearer-oriented constraints, in which case alignment constraints could be classed as constraints of the latter type). A second area of intense research concerns variation and frequency effects. One of the advantages of OT was that it always selects one output in a maximally transparent way (see above). But what if output data is variable? One remark should be made right away: an actual production event is never variable: a speaker only produces one single output on any one occasion, so each utterance is, or can be, the result of a single deterministic grammar. If variation occurs, we should ask the question if this is the result of small changes in the grammar every time so that a different winner will emerge every time at evaluation time, or whether the variation is part of the grammar itself (or both). This question is by no means settled, but it again indicates the importance of distinguishing between the individual (micro-level variation) and the speech community (macrolevel variation). Perhaps the latter is nothing more than a sum of the individual patterns variations, i.e. epiphenomenal. Different ways to model this have attracted great attention in the past decades, e.g. in the work by Anttila (e.g. Anttila (1997, 2006)), Hayes (2000), Boersma and Hayes (2001) and Boersma (2007)). Related to variation are frequency-related effects: lexical items that are frequently used behave differently in various ways from items that are seldom used (see e.g. Bybee (2006)). The role of frequency poses a dilemma for traditional theories of grammar: how do speakers “keep track” of the frequency of a certain item in the community? This information is not normally considered to be stored in underlying representations, nor is it, of course, inborn. We will return to this dilemma below. A possible connection in this respect is (Noisy) Harmonic Grammar (see Coetzee and Pater (2008)), who note the increased attention for variation in phonological theory. In Noisy Harmonic grammar, constraints get numerical weights in evaluation, and the stars which are usually employed in OT tableaux are replaced by numerical values. The candidates’ values and the constraints’ weights are multiplied and the candidate with the highest “harmonic value” is selected. We can envisage a situation in which the frequency of particular candidates is taken into account into this calculation, but of course this would not make sense in a traditional OT grammar, in which there are an infinite number of candidates, pace Richness of the Base. Finally, again regarding the relation between OT and psycholinguistics, in classical OT there is an infinite sets of candidates and the set of possible outputs is also infinite. This may or may not be a problem linguistically or computationally, but it would be advantageous from a processing point of view if the selection mechanism would be a little more tractable. In recent versions of OT theory like that involving candidate chains (e.g. McCarthy (2006, 2008)), the notion that the number of candidates is infinite is given up. Rather, only a limited
number of operations are possible on any input, and only steps that improve that candidate are evaluated, giving us a better handle on problems of opacity. Second, in classical OT all constraints are innate, which may or may not be a necessary assumption. From a learnability perspective, it would be better if constraints were learned, just like other general cognitive skills. To summarise, we have identified three areas where OT is particularly strong (communication, typology and production) and three areas of active research (perception, variation/frequency, and psycholinguistic credibility). With these in mind, let us turn to psycholinguistics. 2.2. Exemplar Theory In psycholinguistics the focus is not on languages as wholes but on individual speech behaviour. One theory that is influential is Exemplar Theory (henceforth: ET), a theory that has come from the psychology literature on categorization, and which has been applied to linguistics by researchers such as Bybee (2006 and references cited there). Space limitations make a detailed description of this theory impossible here. One major difference between standard theories of linguistics and ET is that in the latter there is a much greater role for storage of forms. Two quotes from Bybee (2006) are given in (2). (2) In exemplar theory, every token of experience is classified and placed in a vast organizational network as part of the decoding process. New tokens of experience are not decoded and discarded, but rather they impact memory representations. In particular, a token of linguistic experience that is identical to an existing exemplar is mapped onto that exemplar, strengthening it. Tokens that are similar but not identical (differing in slight ways in meaning, phonetic shape, pragmatics) to existing exemplars are represented as exemplars themselves and are stored near similar exemplars to constitute clusters or categories. (Bybee (2006), p. 716) Thus, instead of a dictionary-like lexicon as in standard (generative) grammar, lexical items are stored in a network-like multi-dimensional organization: items that are similar are stored close to each other. This has psycholinguistic advantages, e.g. mispronunciations will often pick out a form which is close to the intended form. If a certain item is subject to variation, then both items will be stored, roughly in the proportion of the frequency with which the items are encountered. Variation is thus a natural part of the lexicon in an ET grammar. Note, incidentally, that exemplar clouds can be described in OT terms: a difference in a single feature could be conceived of as an IDENT difference, a difference in the presence or absence of a sound could be described as a MAX or DEP difference. In this way, the OT formalism of Correspondence could be useful. Items that are frequently heard strengthen each other in the lexicon. This “level of entrenchment” reflects, but is not identical to, the frequency of a certain item in a speech community. In this way, tokens carry a certain weight on their sleeves, which can be expressed as a numerical value, which will be related to their frequency of occurrence in a speech commu-
nity. The idea that more exposure leads to more storage is squarely incompatible with the usual premise in generative grammar that only non-redundant information is stored in the lexicon. Although this premise has constrained possible solutions for phonological problems in the past, this general principle of economy is not upheld in ET. The evidence about the human brain in the last years has pointed into the direction of more storage (see also Ladefoged (1972), p. 282)). Focusing on the microlevel, ET is a very good theory of individual speech behaviour. It has been particularly successfully applied in the area of word recognition, e.g. where a subject hears a larger and larger portion of word and eventually recognises it. The idea is that a cloud of exemplars is activated up to recognition of a word. Bybee (2006) shows that ET is also an excellent theory to account for various effects of frequency. Since entrenchment is a function of frequency, this theory accounts for such effects in a direct and simple way. A related issue is the role of frequency in language change, which is also a subject of great interest. Bybee shows that the model also makes interesting and correct predictions in this area. ET in its current form also has a number of areas that are of great interest for general linguists. First, it is unclear just how much information is stored in an ET lexicon, in what detail etc. It is obvious that short phrases and idioms are stored as wholes, and not computed on-line. Where does storage end, however? How fast does the memory of words and phrases decay? Is there more to know about the mechanisms of storage? These are ongoing questions of research and more needs to be done in this area. A second desideratum would concern questions of a typological nature. To what extent can languages differ? It is easy to design a language that is not actually found across the languages of the world. ET makes no predictions at all in this respect: the language data that speakers are exposed to are the language data that speakers are exposed to. If there are any exceptionless generalisations that can be made across these data, these generalisations are epiphenomenal, and not a function of the design of language or of anything else. Any tendencies are merely statistical. Many will not be ready to give up the central goal of linguistics to find out the limits of human linguistic knowledge, and to try and devise a model which at least describes, and hopefully explains, these limitations. A final area in which ET might be improved is in its production mechanism. There is, as far as I know, no clear consensus on how a particular exemplar token is selected for production. Imagine a speaker who wishes to pronounce cat and a cloud of variants is available. How will the speaker select the particular token for production at that particular time? This is a little bit of a mystery in Exemplar Theory: in short: ET lacks a grammar for production. 3. Compromising In (3) we tally up the strengths (s) and weaknesses (w) of OT and ET on the basis of the preceding section.
(3) OT s
ET w
s
Typological predictions
Psycholinguistic Psychological / credibility cognitive credibility
Modelling communication
Variation and frequency
Variation and frequency effects
Production
Perception
Perception / word recognition
w No typological predictions
No production mechanism
It is seen that the strengths we identified in OT are precisely the weaknesses of ET, and vice versa. For some, this would be reason to abandon either theory completely and try to repair the weaknesses in the other. Keeping in mind Jackendoff’s (2007) advice to promote collaboration between adjacent fields, however, let us explore a hybrid theory: one which takes the best of both and tries to avoid the pitfalls of either. In this brief space, it is only possible to sketch out some fundamental design properties of such a combined theory. In the combined theory, there is no infinite set of candidates, like in classical OT. Instead, the candidates are the exemplar tokens as in ET, so they are individual-specific and come with frequency-related information, which is a direct function of the degree of exposure to a particular item. The evaluation metric, i.e. the grammar, picks out one of these, viz. the candidate that is best suited for use on that particular occasion (taking into account stylistic factors, for instance). This grammar is an Optimality Grammar. The token is picked out by a familiar OT constraint hierarchy. On the one hand, this gives ET the production part that it currently lacks. The great advantage of keeping constraints is that we can continue to make typological predictions: since constraints are preserved, factorial typology is also preserved and we continue to make testable predictions as to the possible bounds of human language(s). As to the exact mechanism of production, there are two possibilities: either a strategy of abstraction is posited in speakers, as a result of which they can posit an underlying form on the basis of available tokens. A second possibility is that a specific token is marked as the underlying form, so that this has a special status among the exemplars. The phonological grammar could change this token into another one if for instance the phonological environment gives occasion to assimilate. This would accord well with ideas such as those in Burzio (2000) and related work, in which there are “no underlying forms” and that all phonology is basically Output-Output correspondence. This needs further contemplation. As pointed out before, using Exemplar Theory will provide an inroad to accounting for variation and frequency-related effects. In such a model, it is clear that acquirers of the language can learn both patterns of variation, that is, how frequently they are exposed to dif-
ferent forms of the same “word”, and they can learn if certain items are more frequent than others. Both kinds of knowledge are expressed in the ET lexicon, by the presence and strength of individual exemplars. Both kinds of knowledge are therefore available to the phonological grammar, that is, the constraint hierarchy. In this way, we will derive the kind of grammar such as that proposed by Coetzee and Pater (2008), i.e. Noisy Harmonic Grammar, without extra stipulations. 4. Conclusion It is important to return to the relation between the individual and the community in which the individual speaks with others. In the long term, we focus on a model respecting the autonomy of the single speaker’s linguistic art, his or her harmonious behaviour in conversation and safeguard typology. Linguists from different spheres will need to make a joint effort by combining the very tiny with the very broad.
References Anttila, Arto (1997) “Deriving Variation from Grammar”, Variation and Change in Phonological Theory, ed. by Frans Hinskens, Leo Wetzels, and Roeland van Hout, 35-68. Amsterdam: Benjamins. Anttila, Arto (2006) “Variation and Opacity,” Natural Language & Linguistic Theory 24, 893-944. Boersma, Paul (1998) Functional Phonology - Formalizing the Interactions between Articulatory and Perceptual Drives. PhD dissertation, University of Amsterdam. Boersma, Paul (2007) “Some Listener-Oriented Accounts of h-aspiré in French,” Lingua 117, 1989-2054. Boersma, Paul & Bruce Hayes (2001) “Empirical Tests of the Gradual Learning Algorithm,” Linguistic Inquiry 32, 45-86. Burzio, Luigi (2000) “Cycles, Non-Derived Environment Blocking, and Correspondence,” Optimality Theory – Phonology, Syntax and Acquisition, ed. by Joost Dekkers, Frank van der Leeuw and Jeroen van de Weijer, 47-87. Oxford: Oxford University Press. Bybee, Joan (2006) “From Usage to Grammar: The Mind’s Response to Repetition,” Language 82, 711-733. Coetzee, Andries W. (2008) “Grammaticality and Ungrammaticality in Phonology,” Language 84, 218-257. Coetzee, Andries W. and Joe Pater (2008) “The Place of Variation in Phonological Theory” (draft January 2008) To appear in the Second edition of the Handbook of Phonological Theory, ed. by John Goldsmith, Jason Riggle and Alan Yu. London: Blackwell. Grice, Paul (1957) “Meaning,” The Philosophical Review 66, 377-388. Hayes, Bruce (2000) “Gradient Well-formedness in Optimality Theory,” Optimality Theory – Phonology, Syntax, and Acquisition, ed. by Joost Dekkers, Frank van der Leeuw and Jeroen van de Weijer, 88-120. Oxford: Oxford University Press. Jackendoff, Ray (2007) “A Whole Lot of Challenges for Linguistics,” Journal of English Linguistics 35, 253262. Kirchner, Robert (1998) An Effort-based Approach to Consonant Lenition. Doctoral Dissertation. University of California, ROA 276. Ladefoged, Peter (1972) “Phonetic Prerequisites for a Distinctive Feature Theory,” Papers in Linguistics and Phonetics to the Memory of Pierre Delattre, ed. by Albert Valdmann, 273-285. The Hague: Mouton. McCarthy, John (2006) “Candidates and Derivations in Optimality Theory.” Ms, ROA 823. McCarthy, John (2008) “The Gradual Path to Cluster Simplification.” Phonology 25, 1-49. Prince, Alan & Paul Smolensky (1993/2004) Optimality Theory: Constraint Interaction in Generative Grammar. Ms., Rutgers University and University of Colorado. Published 2004, London: Blackwell. van de Weijer, Jeroen (2007a) Compromising in the Communication Conflict: The Demands of Speakers, the Demands of Listeners and what to do about it. Paper presented at Shenzhen University (PRC), April 2007. van de Weijer, Jeroen (2007b) Communication is only Possible because we have Grammar. Paper presented at the Northeast Normal University, Changchun (PRC), July 2007.