Two ways of formalizing OT Syntax in the LFG framework Jonas Kuhn DRAFT, May 1999
1 Introduction Optimality Theory (OT, Prince/Smolensky 1993) as a general framework for linguistic constraint systems has been applied to a number of areas of linguistic research, rst in phonology, later also in syntax and morphology. Assuming the grammar of a language to be specied through a particular dominance ranking over a set of conicting universal constraints provides high explanatory strength within a very simple system. The ranking determines which of several competing candidate analyses is the most harmonic one (triggering the least serious constraint violations) in the given language. Languages dier only in the relative ranking of the constraints. But it is not only the typological dimension that makes OT attractive, the approach may also give rise to more general language-specic accounts: the concept of constraints that may be violated in an analysis in order to satisfy more highly ranked constraints allows for clear and simple formulations of linguistic principles, even in the face of complex constraint interaction as it will occur in any non-trivial syntax fragment. To date, most work in OT syntax has focused on fairly restricted sets of empirical data, such that little can be said on whether the system does eectively scale up to a realistic amount of constraint interaction. This is in part due to the fact that OT is still a young eld of study, but it seems also that there is a limit to the size of an OT analysis that can be mastered on a piece of paper. A thorough assessment of the benets of OT presupposes computational devices that allow one to manipulate larger sets of constraints and larger candidate sets with more complex candidate analyses. For the application domain of phonology, recent work on formalization (Frank/Satta 1998; Karttunen 1998) demonstrates that OT can be integrated in the nite-state tradition of computational phonology. For the syntactic domain, Bresnan proposes in a number of papers (see e.g., Bresnan 1996; Bresnan 1998c) to integrate OT with the syntactic framework of Lexical-Functional Grammar (LFG, Kaplan/Bresnan 1982), whose computational properties have been studied extensively (see, e.g., contributions in Dalrymple et al. 1995).2 LFG's nonderivational system of correspondence between parallel structures lends itself for making assumptions of OT syntax precise. Institut für maschinelle Sprachverarbeitung, Universität Stuttgart, Azenbergstr. 12, D-70174 Stuttgart, Germany, Email:
[email protected], WWW: http://www.ims.uni-stuttgart.de/jonas/. 2 With view to the mentioned perspective of implementing larger OT fragments, we may note in addition that in the Xerox Linguistic Environment (XLE), there exists an implementation of the LFG formalism
1
I will refer to the optimality theoretic LFG model as OT-LFG. Even in pure LFG, recent work on the general architecture of the mapping from constituent (c-)structure to functional (f-)structure assumes an economy principle (Economy of Expression; cf. the overview in Bresnan 1998a, ch. 6-7), and thus relies on a comparison between competing candidate analyses. To capture this formally, a mechanism similar to the candidate evaluation in OT is required. As also Johnson (1998) notes, there is still a number of open questions in the context of a formalizing OT in LFG. The present paper contributes to the discussion, investigating two alternative ways of integrating OT in the LFG framework. Both alternatives require only minor modications of the underlying LFG architecture, and experimental versions of them have been applied to small syntax fragments. The rst one is fairly close to Bresnan's formulation of OT-LFG and uses the ranking mechanism of Frank/King/Kuhn/Maxwell (1998) (which they employ to decide between alternative analyses in parsing) on the set of generation alternatives for a given input f-structure. Since this approach requires that the candidate analyses be eectively constructed prior to the optimizing competition, it may face serious complexity problems when scaled up. The second formalization attempts to avoid the need of an online competition between eectively generated candidates by transfering the insight of Karttunen (1998) for OT phonology to LFG rules Karttunen shows that for nite-state phonology, the optimizing competition can be precompiled. Covering two fairly dierent approaches in this relatively short paper, I can neither go into much formal detail, nor can I spend much time on linguistic motivation. The aim is rather to sketch some of the possible directions an exact formalization might take. It is certainly too early for a denite assessment of the two options. Many of the decisions one has to take in such a formalization concern notions that still seem to be in a state of ux in the linguistic literature. Ideally, there should be a cross-fertilization between linguistic work and work on the formalism. The paper is structured as follows: after providing some further background on OT in section 2, the two formalizations are addressed in turn, in section 3 and section 4. In the concluding section 5, I will briey discuss some of the possible implications of the two formalizations.
2 Background Let us start with an illustrative example of an analysis in OT syntax. It is taken from Grimshaw's (1997) account of inversion in English, which is set in a syntactic framework working with a representational simulation of movement derivations. (Bresnan (1998c, sec. 2) shows that Grimshaw's constraint system can be reconstructed in the LFG framework, and the examples I use to illustrate the formalizations in the present paper will also be based on this fragment.) Assume the constraints in (1) are members of the universal inventory of syntactic constraints (Grimshaw 1997, 374). that is designed for non-trivial grammar fragments (cf. Butt/King/Niño/Segond to appear), providing an interface to morphological analyzers and the capability of processing large lexicons. The system even provides a particular constraint ranking mechanism, as discussed in (Frank et al. 1998). It is an interesting question if and how OT approaches from the theoretical literature could be integrated in such a system.
2
(1)
Op-Spec Syntactic operators must be in specier position. Ob-Hd A projection has a head. Stay Trace is not allowed.
For English, the dominance ranking is as follows: Op-Spec Ob-Hd Stay, i.e., it is more important that an analysis satisfy Op-Spec than Ob-Hd or Stay, etc. This ranking is considered when for a given underlying representation an input the grammatical form is determined. Given an input representation, there is a universal range of candidate analyses that compete for the status of the grammatical one. (In dierent languages, dierent candidates may win.) The function that takes an input to a set of candidate analyses is called Gen. The rst column of the table in (2) shows some sample candidates that are contained in the set that Gen assigns to the representation underlying the English question what will she read (Grimshaw 1997, 378). (2)
Candidates
Constraint violations [IP she will [VP read what]] *Op-Spec [CP what e [IP she will [VP read t]]] *Ob-Hd, *Stay [CP what willi [IP she ei [VP read t]]] *Stay, *Stay
For each candidate it is checked which of the constraints it satises; a violation is marked with an `*' (e.g., the rst candidate has the wh-operator what in the complement position of the verb, thus failing to satisfy the constraint Op-Spec in (1)). Formally, the function marks assigns a multiset of constraint violations to each analysis. Based on this marking of constraint violations for all analyses in the candidate set, and the constraint hierarchy for a particular language, the function Eval determines the most harmonic, or optimal, candidate: the grammatical analysis. (There may also be a set of equally harmonic candidates.) For two competing candidates the more harmonic one is dened to be the one that contains less violations for the highest-ranked constraint in which the marking of the two diers. The result of the evaluation is standardly notated in a socalled tableau (3), with the columns for the constraint reecting the hierarchy of the language under consideration. (3)
Candidates
[IP she will [VP read what]] [CP what e [IP she will [VP read t]]] ☞ [CP what willi [IP she ei [VP read t]]]
Op-Spec Ob-Hd Stay
*!
*!
* **
If a candidate loses in a pairwise comparison, the fatal mark is highlighted with an `!' (e.g., the rst candidates is less harmonic than the second one, since they dier in the highest-ranked constraint Op-Spec). Note that the score that the losing candidate has for lower-ranked constraints is completely irrelvant. Ultimately, the candidate that remains without a fatal constraint violation is marked with the symbol ☞ as the winner of the entire competition. In the example, the bottom analysis is optimal, although it violates the constraint Stay twice. The other analyses are predicted to be ungrammatical. Note that there will always be at least one winning analysis for a given (nonempty) candidate set, since optimality is dened relative to the competitors. After this informal example, we can identify the notions that a formalization of OT must pinpoint and moreover capture in a computationally tractable way: the input representation, the function Gen, the formulation of constraints, the function marks checking for 3
constraint violations, and the function Eval.3 For some of these concepts, the assumptions made in dierent incarnations of OT vary signicantly. In the bulk of this paper, I adhere to the standard interpretation of optimality as singling out the grammatical analysis against its competitors which are throughout ungrammatical (for the underlying representation given in the input). This constrasts with the notion of optimality as preference among several grammatical analysis adopted in (Frank et al. 1998), and F. Keller's (1998) extended concept of OT, which also covers gradedness in grammaticality judgements. In section 3.5, I will briey come back to such other interpretations.
3 A straightforward OT model in LFG The rst of the two models I discuss follows the proposal of Bresnan (1998c), assuming the function Gen to be specied as a classical LFG grammar. Such a grammar relates an f-structure-based input to a full LFG analysis, on which constraints can be applied; thus a comparison among all analyses associated with a given input can model an OT competition.
3.1 The input and Gen
The specication of the input. Bresnan (1998c, sec. 2) presents a relatively faithful reconstruction of Grimshaw's (1997) OT system with the formal means of LFG.4 The input to Gen is a (possibly underspecied) feature structure representing some given morphosyntactic content independent of its form of expression (Bresnan 1998c, sec. 1.1). An example (that in English would have I saw her as its optimal realization) is (4). 2
(4)
6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4
pred `see(x,y)' 2 pred 6 gf1 4 pers num 2 pred 6 6 pers gf2 64 num gend tns past
3
`pro' 1
3 7 5
sg 3x `pro' 3
sg fem
7 7 7 5
y
7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5
We may assume that for a particular OT system, the degree of specication of such a feature structure can be dened precisely. It will crucially contain all pred values, and the mapping from argument slots to f-structures (without the grammatical functions gf being necessarily specied). In addition, particular semantically relevant features, like tense (tns) I will not say anything about the learning of a constraint hierarchy, which has received much attention in the OT literature, cf. (Tesar/Smolensky 1998). 4 The input that Grimshaw assumes for a verbal extended projection consists of a lexical head plus its argument structure and an assignment of lexical heads to its arguments, plus a specication of the associated tense and aspect (p. 376). From this input, Gen generates all extended projections conforming to X-bar theory as alternative realizations of this argument structure. The output thus consists of representational simulations of transformational derivations using chains and traces, as Bresnan (1998c, 1) puts it. Bresnan argues for a more radically nonderivational theory of Gen, based on a parallel correspondence theory of syntactic structures (p. 1). 3
4
and number (num) are specied. In order to arrive at a full specication, the underspecied functions have to be xed to one option from a nite choice; furthermore feature-value pairs (from a nite set of possibilities) may have to be added. Note however that it makes sense to assume that no new recursively embedded f-structure needs to be added. So we can note the following restriction (see (Wedekind to appear) and below for discussion). (5)
Restriction on the extension of input structures (informal) For each partially specied input f-structure there is only a nite number of fully specied f-structure extensions, which can be determined on the basis of the grammar specication.
Practically this can be achieved by allowing the addition of features in generation only for a specic list of features with a nite range of values.
The function Gen. With input structures dened along these lines, Gen can be straight-
forwardly formalized by a classical LFG grammar GGen . Given an input representation i , the set of candidate analyses Gen (i ) is the set of LFG (c- and f-structure) analyses hT; 0i generated by GGen , such that i v 0 , i.e., those analyses whose f-structure is subsumed by the input f-structure. This is exactly the classical generation task. To model Grimshaw's fragment of inversion data in English, the LFG grammar will have to formalize a theory of extended projections. This can be done on the basis of LFG's extended head theory as discussed in detail in (Bresnan 1998a, ch. 6-7). The principles Bresnan discusses can be eshed out in a set of LFG rules, i.e., context-free rules5 with f-annotations. All analyses satisfy underlying LFG principles, such as functional uniqueness, and completeness and coherence. It should be noted that although the resulting grammar is formally an LFG grammar, it is certainly unusual since it overgenerates vastly, producing all universally possible cstructure-f-structure pairings. This is due to the special role that this LFG grammar plays as part of an OT model: given the dierent denition of grammaticality, the set of analyses generated by the LFG grammar is not the set of grammatical analyses, as classically assumed. Rather, it is the union over all possible candidate sets (for any input). Can the set Gen (i ) just specied also be computed for a given i ? We are looking at an instance of generation from underspecied f-structures. Wedekind (1995) shows the decidability of generation from fully specied f-structures; in contrast, Wedekind (to appear) presents a proof that generation from f-structures specied only partially is undecidable in the general case (see also Dymetman 1991). The degree of specicity postulated for the input in (5) will however ensure decidability.6 The task of generating candidates from an input can actually be performed by LFG systems that implement a generator, like the XLE system does.
Generating unfaithful candidates. There are empirical situations where a language L1 contains a single grammatical expression for a number of dierent input forms, while a
More precisely, a generalization of context-free rule notation which allows regular expressions on the right-hand side. 6 Since (5) guarantees a nite number of extensions of , we can divide the problem into a nite number of decidable subtasks: (i) extend the f-structure in all possible ways. (ii) Generate from each of these fully specied f-structures. (iii) Output the union of results from subtask (ii). (See also Wedekind to appear.) 5
i
i
5
language L2 makes a ne-grained distinction, providing dierent expressions for each case. A simple example would be the person and number distinction in verbal inection: the continental Scandinavian languages constitute type L1, providing a single form in each tense for any person/number combination, whereas Icelandic is of type L2, providing practically full morphological distinctions. To account for such situations in OT, the assumption of unfaithful candidate analyses is necessary: besides candidate analyses that are faithful to the ne-grained distinction in the input, the candidate sets for both languages have to contain the less explicit analysis. The constraint ranking in language type L1 will favour the unfaithful candidate, while in type L2 the constraints checking for faithfulness are ranked higher, thus enforcing the overt dierentiation. Bresnan's OT system for auxiliary selection (Bresnan 1998c, sec. 3) also relies crucially on the assumption of faithfulness violations. It is not totally straightforward how the generation of such unfaithful candidates should t into the picture of Gen monotonically adding information to the input representation. However, the grade of the candidates' deviance from the input required for the systems in the literature to work seems to be within narrow limits unfaithful candidates realize morphosyntactic feature values from the input dierently, but no pred value will be changed, nor will any new pred-bearing f-structure introduced. Since the faithfulness constraints have to see the original input anyway (in order to detect divergencies), we can adopt the technical solution of keeping track of two copies of the relevant morphosyntactic features: one for the input specication, and one for the actual realization. (6) shows this idea in rst approximation (there are some problems which will be discussed in 3.2). 2
(6)
6 6 6 6 6 6 6 6 6 6 4
pred `pro' pers 1 num sg gend fem 2 pers 3 m-rlz 64 num sg gend masc
3
7 7 7 7 7 7 3 7 7 7 7 7 5 5
The input form of the agreement features appears in the main f-structure, while their actual surface realization is given in the copy of the features embedded under m-rlz (for morphological realization). The faithfulness constraints could now be formulated as enforcing the unication of these structures (which would result in a clash for (6), as intended). The grammar specication of GGen has to be altered accordingly to generate both the faithful and the unfaithful candidates (i.e., we introduce again much more overgeneration). With this trick, a limited amount of deviation can be simulated within the well-behaved monotonic system. Empirical studies will have to show whether the restriction is too strong; note however, that allowing unrestricted deviation from the input will result in an undecidable system.
3.2 Faithfulness and functional uniqueness
There are two problems with the proposal illustrated by (6), having to do with the classical LFG concept of functional uniqueness. A related point is also made in (Johnson 1998, 6
sec. 3).7 The rst problem becomes clear when we look at a typical agreement situation in which a structure like (6) plays a role: take for instance the simple case of determinernoun agreement. Here, two c-structure nodes will give rise to f-annotations addressing this f-structure (shown schematically in (7)). (7)
DP
"=# NP
"=#
"
agr hi m-rlz agr
# i ?
D N ("agr)=i ("agr)=i ("m-rlz agr)=D ("m-rlz agr)=N Since the m-rlz-feature is part of f-structure, it will be unied whenever f-structures are
unied. Thus, only faithfulness violations can be dealt with in which both of the c-structure exponents in an agreement conguration behave identically (i.e., D = N). This is certainly not a suitable restriction: in the typical subject-verb agreement examples, the subject may be a faithfully specied pronoun, but the verb agreement is unfaithful to the input (cf. e.g., Bresnan 1998c, sec. 3).8 The second problem comes about when underspecication and unication interact, and a faithfulness violation does not consist in a conicting feature specication, but in the lack of specication.9 Assume that (8) is the relevant part of an unfaithful realization of 3rd person singular feminine agreement morphology (e.g., for a verb's subject), realizing only the number feature. We will not be able to detect this faithfulness violation by unifying the agreement features, since unication will succeed even if one structure is unspecied for a certain feature. 2
(8)
6 6 6 6 6 6 4
pred `pro' pers 3 num sg gend fem h m-rlz num sg
3
7 7 7 7 7 7 i 5
There are several possible solutions for the problems. To solve the rst problem, we might assume that the structure under m-rlz is not part of f-structure, but rather a morphological projection from c-structure that makes a more ne-grained distinction (this has been proposed in the context of complex verb forms by Butt/Niño/Segond 1996). The second problem could be solved either by applying a more sophisticated comparison operation Johnson raises the issue whether the concept of functional uniqueness from classical LFG is needed at all in OT-LFG, suggesting that a purely resource-based feature interpretation suces, as proposed in resource-sensitive LFG (R-LFG; Johnson 1999 a generalization of the linear-logic based semantics for LFG, see e.g. the contributions in Dalrymple 1999). He argues that the full unication machinery might be a superuous ballast. 8 An instance of a similar problem occurs for the Agr constraint that Bresnan (1998c, sec. 2) assumes (`A subject and its predicate in c-structure agree'). The discussion of her example (36) relies on an innite verb not being marked for subject agreement, although it will f-structurally unify with a nite auxiliary. 9 The latter would be a violation of Parse, but not of Fill in the model of (Bresnan 1998c, sec. 3), following Grimshaw (to appear). 7
7
(checking for proper subsumption), or by introducing a special value unrealized for the m-rlz features. However, in both cases Johnson's (1998) objection that the unication machinery might be superuous ballast (cf. fn. 7) seems to apply. Another solution would be to assume a richer substructure in the description of cstructure categories, i.e., complex category symbols. Bresnan (1998a) uses such complex symbols to express generalizations about an X-bar theory; the dimensions she uses in the complex description include lexical class features, functional status and bar level. Now, the spelled out morphosyntactic features could be added to these c-structure features, and the faithfulness constraints could check for the correspondence between the f-structure specication and the c-structure realization (without recurring to unication). In this set-up, no m-rlz feature would appear in the f-structure. This approach may be quite similar in eect to Johnson's (1998, sec. 3.1) suggestion of using features with a non-unicational interpretation, also distinguishing semantic argument-structure features appearing in the input from corresponding supercial verbal inection features.10
3.3 Constraint marking
Having suggested a formalization and computational treatment of the input and the Gen function, we have to address next how the constraint violations can be detected in the candidate analyses, to model the function marks. Applying marks to a candidate returns the multiset of the constraint labels for the constraints that the candidate violates. In most OT work the constraints are formulated in prose, and the application of marks is performed manually in illustrative tableaux for those candidates considered relevant. It is however a central assumption that the constraints can be formalized as structural descriptions of the type of representations output by Gen. I will rst consider a very general architecture for constraint marking, which may however pose a decidability problem. Consequently, I propose a more restricted constraint marking technique and argue that it meets the actual requirements.
A general architecture for constraint marking. Formally, constraints will typically take the shape of universally quantied implications: whenever a structure satises the description A, it should also satisfy the description B. A clear example of such a general formulation is Bresnan's (1998c) denition of Op-Spec: (9)
Op-Spec: (Bresnan 1998c) an operator must be the value of a df [discourse function] in the f-structure
Yet another solution for both problems would be to leave the m-rlz structure completely out of the representation and deal with faithfulness within the descriptions, i.e., the f-annotations in the rules and the lexicon. (This presupposes that the constraint marking is folded into the grammar formalizing Gen, which will be the result of the considerations in section 3.3 below.) Dalrymple/Kaplan (1997) propose a treatment of agreement in terms of set membership constraints. A form that can realize, say nominative or accusative case, will come with the annotation ("case) 2 fnom, accg. In modelling faithfulness, this system would allow for a ne-grained distinction between (i) an exact match (membership in a singleton set), (ii) dierent degrees of lack of specication (Parse violations, cf. fn. 9; membership in a set with two or more elements), and (iii) conicting specication (Fill violations; nonmembership in the set). 10
8
To express such conditions formally, a feature logic is required that includes general negation and quantication over c-structure nodes and f-structures (see (10a)). Following B. Keller (1993), one could alternatively use a logic without universal quantication, but with general negation and unrestricted functional uncertainty11 : (10b). (10b) should be thought of as an f-annotation at the root node of the grammar, f is here a local variable for an f-structure, similar to the metavariables " and #. (10) a. 8 f:[9 g:(f op) = g ! 9 h:(h df) = f ] b. :[("gf*)= f ^ (f op) ^ :(df f )] Since many constraints address the c-structure, we will have similar formulations involving c-structure nodes. Let us assume for the moment that such a logic can be handled (we will come to the satisability problem shortly, which is undecidable). One way to implement the constraint marking would be to assume a separate module that would take the output of Gen a set of candidate analyses as its input and compute the constraint violations. Since this extra component would be dealing with the same type of representations as the component implementing the Gen grammar, an obvious simplication is to use the same system. This would have the great advantage that the marks computation can immediately exploit sophisticated packing techniques for the representation of the candidate set (Maxwell/Kaplan 1989). Since we can think of the functional-uncertainty version of the constraints as annotations to the root node, we could formalize marks by the following construction. Assume that GGen = hN; T; S; Ri is the LFG grammar implementing Gen (N is the set of nonterminal symbols, T the set of terminal symbols, S the start symbol, and R a set of annotated rules). Con is the set of constraints, formulated in the LFG functional description language, as typically used in f-annotations, including negation and unrestricted functional uncertainty (or more precisely, regular designators for both feature descriptions and tree descriptions as the classical M). For C 2 Con, the label(C) is dened as the name label of the constraint. We can now dene an extended grammar GGen ;marks = hN 0; T; S0; R0i, where S0 is a new start symbol, N 0 = N [ fS0g, and R0 = R [ fS0 ! S : " = # ^ ACon g. The constraint marking part of the rule annotation ACon is constructed from the set of constraints Con as follows: ^ (C _ (:C ^ label(C ) 2 ("marks))) (11) ACon = C 2Con
For each constraint, a conjunct exists which enforces the constraint, or alternatively, if the constraint is not satised, introduces the label of the constraint to the multiset under the feature marks. Ultimately, the value of marks will be the multiset of (starred) constraint labels that the respective candidate analysis violates, i.e., the eect of the marks function we wanted to formalize. Note that with this construction, both the eect of Gen and marks are implemented within a single LFG grammar. To complete the OT model, we merely need a component comparing the marks values of the dierent candidates according to The interpretation of functional uncertainty in the standard LFG formalism is more restricted, excluding cyclic interpretation (Kaplan/Maxwell 1988). This ensures decidability of the satisfaction problem. 11
9
the denition of Eval. (This component must quite obviously be located outside the LFG grammar, since it has to operate on several analyses simultaneously.) As announced, we have to address the expressiveness of the logics underlying this set-up in more detail: the satisability problem for feature logics with universal quantication and general negation is undecidable (B. Keller 1993, 4.4). This would not be a problem if the candidate set generated prior to the constraint marking could be guaranteed to be nite. Then one could assume that the constraint formulae are restricted to the minimal model of the candidate representation being tested. Note however that even with the restriction (5) from section 3.1, an LFG grammar can generate an innite number of analyses for a given input. For example, the grammar may allow for c-structure recursion that does not involve any pred-bearing lexical material (the idea being to have this freedom constrained by the OT constraints). In fact, Grimshaw (1997) assumes explicitly that an extended projection may contain arbitrarily many functional projections. For negative preposing in English, an analysis comes out as optimal that contains an extra functional projection between IP and CP, which Grimshaw labels XP (p. 400). The innity of the analyses generated is not a problem for the decidability of the generation task itself (a nite representation of all possible analyses can be produced), but it is very likely that it becomes a problem when a separate component should check constraint satisfaction: in the general case, this component would have to unfold the nite representation resulting from recursion, and it is not clear how this can terminate. Another unclear point is how multiple constraint violations which are crucial in linguistic OT work could be captured. A given model may fail to satisfy a certain formula, but it is not clear in what way one could say that it fails to satisfy the formula more than once. One would probably have to assume constraint formulations checking for multiple occurrences of the illegal conguration (cf. also Karttunen 1998 to be discussed in section 4.1 below).
Constraint marking within the LFG grammar. There is an alternative way of re-
alizing the constraint marking that avoids the decidability problem and allows for much more ecient processing. Violations of a particular constraint result typically only from the applications of particular rules. That means, we can take the interleaving of the Gen function and the marks function in our LFG grammar GGen ;marks a little further and introduce constraint labels to the marks multiset in the places in the grammar where the structure violating against a constraint is created. In other words, all rules may contain f-annotations that implement part of the marks function (besides having the usual f-annotations which contribute to formalizing Gen ) This technique has the positive side eect that it avoids the potentially enormous complexity involved in the application of various functional uncertainty expressions to the entire analysis.12 Since constraint labels can now be introduced into the marks multiset in every rule, we have to ensure that all contributions are collected and made available at the root node of the analysis. This is achieved by identifying the marks feature of all daughter consituents with the mother's by the equation ("marks)=(#marks), creating a single multiset for the
Since the constraint tests are part of the grammar, no unfolding of innitary representations (cf. the previous paragraph) is required. The decidability of the generation problem will also guarantee the decidability of constraint marking. 12
10
complete analysis. Note that multiple violations of a single constraint fall out from the use of a multiset.13 If we use a special projection o instead of the feature marks (and assume implicit trivial equations identifying the o-structure of all constituents) we are very close the system of (Frank et al. 1998), which is built into the XLE system. (The dierence will be addressed 3.5.) The XLE system also provides an (extended) implementation of the Eval -function, based on the marks introduced to the o-projection, and a dominance hierarchy specied in the conguration section of the grammar.14 Let us go through an illustrative example: the formalization of LFG's extended head theory will include a rule like (12) or similar, with the (c-structure) head of V0 optional. Among the violable constraints there are (13) and (14) (Bresnan 1998c, sec. 2).15 (12) V ! 0
V
"=#
!
XP
!
"cf) = #
(
(13) ob-hd: every projected category (X0 , X00 ) has a lexically lled head. (14) stay: categories dominate their extended heads.
Being part of grammar GGen , rule (12) will produce candidates that violate constraints (13) and/or (14) as well as candidates that satisfy these constraints. If we formulate the optionality of V as an explicit disjunction ({ V | }), we can tell the two types of candidates apart by simply checking which disjunct was used in the analysis. Thus, we reach the desired marking eect if we introduce a mark for the violation of the constraint stay in an annotation of the :16 ( ) ! V XP 0 (15) V ! " = # stay2 oM ("cf) = # For constraints that involve a more complex condition, additional disjunctions can be introduced in the f-annotation in order to be able to identify a constraint violation. A potential advantage of this technique over an isolated constraint marking module besides the complexity issues is that one can avoid introducing representations that have their sole motivation in constraint identication. Thus, the copying of structure as a basis for faithfulness constraints as discussed in section 3.2 could be avoided (cf. fn. 10). An alternative way using a standard set would be to interpret the constraint marks introduced as instantiated symbols (like the pred values in standard LFG), i.e., as pairwise distinct. 14 As Frank et al. (1998) discuss in detail, XLE distinguishes several types of constraint marks in particular preference marks besides dispreference marks. For the purposes of modelling a standard OT account, the dispreference marks suce. 15 The constraint (14) is stricter, since (13) allows for the situation that the (extended) head of V0 is located in the functional categories I or C. cf is an underspecied notation for complement functions. 16 To identify a violation of (13) in terms of a particular rule alternative, some slightly more complicated construct is needed; there are dierent options: a ag feature that can be checked for, or a more negrained distinction of (complex) category symbols that allows a distinction of overtly headed and unheaded categories (cf. section 3.2 above). 13
11
A note on generality. Since the LFG rules may apply recursively, the grammar-based
introduction of constraint violation marks can cover situations of arbitrarily many constraint violations. Thus no need for universally quantied implications or functional uncertainty expressions with a cyclic interpretation is given in this set-up. Nevertheless, this approach fails to make certain generalizations explicit: for example, the constraint stay (14) is intended to apply to any X0 and XP category. But the formulation of constraints proposed here would in principle allow a very unsystematic distribution of the stay identier, e.g., omitting it in the N0 rule, or introducing it elsewhere in an entirely dierent situation. Note however that this lack of explicit generalization is not specic to the newly introduced labelling of violable constraints. The same objection could be raised against the inexplicity of extended X-bar theory when the grammar is formulated as a set of annotated c-structure rules (one could, e.g., write down a rule A0 ! N V0). The appropriate way to look at these rules from a theoretical perspective is thus to regard them as controlled by general meta-constraints (which could in principle be made explicit in a precompilation scheme) these meta-constraints will not only ensure that the actual c-structure rules conform to X-bar theory, but also that markers identifying constraint violations are introduced appropriately.17
3.4 The parsing task
So far, we have looked at the application of the LFG grammar GGen ;marks only in the stepwise manner underlying the denition of grammaticality in OT, which took us from an input f-structure via the generation of alternative candidate analyses to the identication of constraint violations which is input to the evaluation function computing the optimal analysis. While this alone may be illustrative for an experimental system, we expect a little more of the implementation of an OT grammar: it should recognize the strings in the language generated by the OT grammar, and it should moreover assign the structures of the grammatical analyses to the strings. Johnson (1998, sec. 4) formulates the following parsing problem: (16) The universal parsing problem for OT-LFG: Given a phonological string s and an OT-LFG G as input, return the input-candidate pairs hi; ci generated by G such that the candidate c has phonological string s and c is the optimal output for i with respect to the ordered constraints dened in G.
In the light of the parsing-based application of an optimizing competition by (Frank et al. 1998), it is very important to note the following (cf. also the discussion in Johnson 1998, sec. 4.2): simply applying the grammar GGen ;marks introduced in 3.3 in the opposite direction i.e., for parsing a string rather than for generating from an input f-structure and computing the most harmonic of the alternative parsing analyses has not the eect of solving the parsing problem (but see 3.5). Although technically the original grammar GGen ;marks can be When carefully dealt with, the liberty in constraint formulation has certain advantages for explorative design tasks: when a linguist wants to check a certain hypothesis about OT syntax she/he can quickly write (or modify) a grammar for a fragment, marking only the relevant constraints. Based on this fragment, she/he will be able to experiment with alternatives in ranking or in constraint formulation without having to worry about orthogonal issues to do with universal constraints. For example, the entire OT fragment of (Bresnan 1998c, sec. 2) was relatively easy to implement in the XLE system with this strategy, i.e., leaving aside instances of constraint violation where they were obviously irrelevant. 17
12
used to rank alternative parsing analyses of an input string, the resulting optimal candidates will not satisfy the generation-based grammaticality denition: the Gen grammar was set up to allow for all universally available alternatives. For example, with the GGen ;marks -grammar for the Grimshaw/Bresnan fragment we discussed above, the strings in (17b) are among the ones generated for the input f-structure (17a) (with What does she read being the optimal candidate). (17)
a.
2 6 6 6 6 6 6 6 6 6 6 6 6 6 4
pred `read(x,y)' 2 3 pred `pro' 6 7 3 7 gf1 664 pers num sg 75 gend fem x# " op q gf2 pred `thing' y tns pres
3
b.
7 7 7 7 7 7 7 7 7 7 7 7 7 5
Reads she what Read she what She reads what She read what She do read what ... What does she read Do she read what Does she read what
Suppose we try to parse one of the ungrammatical strings like She do read what, applying GGen ;marks . It will receive at least one analysis; now, we might hope that the optimizing competition will rule out this candidate analysis. However, all candidates in the set of analyses constructed in parsing i.e., our candidate set in this context are analyses of this very string. So, no matter which one will win according to the optimization, the optimal analysis will trivially have this string as its yield, although this string is ungrammatical. The same is illustrated by the abstract illustration (18) taken from (Johnson 1998, sec. 4.1): (18) Inputs i2 i1 increasing Candidates c4 c3 c2 c1 optimality ! Strings s3 s2 s1 Parsing the string s2 will produce the analyses c2 and c3 . Being more optimal, the candidate analysis c2 would win a competition among these two candidates sharing the same string; however, for the input underlying c2 (the predicate-argument structure i1 ), there is a more optimal analysis: c1 (with a dierent surface string: s1 ). Thus c2 is in fact not a grammatical analysis. In this case, the alternative parse c3 is indeed the optimal candidate for its underlying input representation i2 . This shows that in the straightforward formalization of OT, with a real competition, more care has to be taken to ensure that the right candidate set enters the competition even when the processing direction for the overall system is turned around. We should use GGen ;marks in the parsing direction only to nd out possible f-structures. Then we can extract from these f-structures the amount of information that forms an OT input. We perform a backward generation step, generating from the extracted OT input f-structures, applying the original generation-based competition. For each of these competitions there are two possibilities: (i) the optimal candidate has a dierent phonological string this means that the string we started from is not grammatical for that input; or (ii) the optimal candidate has the string we started from this means we have found one grammatical analysis. If case 13
(ii) occurs for none of the competitions based on inputs extracted from the parsing results, then the string is not contained in the language generated by the OT grammar. To put this a little more formally: Assume an LFG grammar G, which denes triples hT; ; Oi, where T and are a c-structure/f-structure analysis in the classical sense O is a multiset of constraint marks for the constraints violated by this LFG analysis. The grammar of a language L is dened by the constraint ranking RL. We furthermore assume that a lter F exists taking a fully specied f-structure to an underspecied f-structure, which contains just the amount of information we assume as the input, in accordance with the restriction (5). The function yield applies to a c-structure tree and returns the string of terminal symbols. Let us rst go through the simpler case of language production. Here the optimal analysis is determined as follows: the input consists of an (underspecied) f-structure i; (P-i) we determine all analyses hT; 0; Oi in G with i v 0 ; (P-ii) eval computes the set of optimal analyses (often a singleton) based on the constraint violations O and the languagespecic constraint ranking RL. So, we can think of production as a function that takes an underspecied f-structure to a set of optimal analyses hTk ; k ; Ok i. w: `a b c'
(U-i) Parsing with G
h
.
.
abc
2
,
4
Ap;1
p a 5 s ph xy , fo2 ; o4 gi
(U-iii) Generation with G
Ag;11
.
Ap;i
...
ha b. c ,
h . , pg b , fo3 gi abc :::
Ap;n
pc h :::
, fo1 gi
(U-ii) Filter out Pred-Arg-Struc.
F
.
...
3
(01 ) : pgf a p x .
Ag;1j
h . ,. . . , fo2 ; o4gi h . ,. . . , fo4 gi abc cab eval local to generated candidates:
F
Ag;1m1
.
Ag;i1
h . ,. . . , fo3 gi
...
abc
*
(0 ) : p b
F
i
...
.
h .
Ag;im
abcd
(0 ) : p c n
i
,. . . , fo2 gi
*
optimal
optimal
(U-iv) Comparison with input string w: c a b 6= w abc=w ✗ ✔
Figure 1: Applying optimality in language understanding schematic illustration. Now we can address language understanding the general parsing problem (cf. also the schematic illustration in g. 1). The system starts out with a string w; (U-i) rst of all, the parser determines all analyses Ap = fhT; ; Oi in G with yield (T) = wg; after (U-ii) ltering the input information F(j ) out for each analysis hTj ; j ; Oj i 2 Ap, we (U-iii) apply the 14
...
production function to each of the resulting underspecied f-structures to determine the respective optimal candidates under a generation view: we obtain a set of optimal analyses for each parse of the string (again, often a singleton set); (U-iv) from the union of these winner sets we subtract all analyses for which yield (T) 6= w. The result is the set of grammatical analyses for the input string w. Due to the subtraction of those analyses yielding a string dierent from the input in step (U-iv), the situation can occur that the set of grammatical analyses is empty. This is a dierence from the purely generation-based view, where the set of optimal analyses will always contain at least one candidate (unless the grammar G contains no analysis for the input, i.e., the candidate set is empty).
3.5 Competition among parsing alternatives
The discussion of the parsing problem with a standard OT grammar suggested that competition between candidates sharing the same surface string does not take us anywhere useful, when we are interested in formalizing the concept of grammaticality. However, Frank et al. (1998) show that the ranking of alternative parses can be very useful as a preference mechanism certainly based on dierent constraints, and with a dierent underlying LFG grammar (with a classical concept of grammaticality).18 With the formalization proposed in the previous subsections at hand, we are in a position to address the question whether constraint systems with generation-based competition are necessarily distinct or even incompatible with systems designed for a parsing-based competition. Since this point is related only indirectly to the main point of this paper, I will just briey sketch an argumentation in favour of compatibility. The basic idea is very simple. The technique sketched in g. 1 stops at the point where the winners of the generation-based competition the grammatical analyses are checked against the original string. In the situation where more than one analysis survive this step, i.e., when the string is ambiguous, we could again apply Eval to single out certain of these grammatical analyses as optimal in a dierent sense. A natural interpretation of optimality in this context is preference of the respective reading. I will call the technique of applying a OT-style competition both in generation (based on a candidate set with a common input), and in parsing (based on a common string) a bidirectional competition technique.19 Also, Smolensky (1996) proposes to explain the lag of children's production abilities behind their ability in comprehension by assuming that in comprehension a simple parsing-based optimization is performed, which allows to process the strings that the child hears with the same constraint ranking that is applied in production. Thus in comprehension, many analyses are accepted that are not grammatical under the child's current constraint ranking (according to the Gen -based denition of grammaticality). The simple parsing task is liberal enough not to lter them out. However in production, the common underlying structure does determine the candidate set, and the constraints will have a strong ltering eect. The result is a very reduced production ability for the initial constraint ranking. The simpler parsing task is called robust interpretive parsing in (Tesar/Smolensky 1998), and plays an important role in the learning algorithm. 19 There are two conceivable combinations, depending on what condition has higher priority. For the parsing task we are discussing, it is the common-string condition that takes precedence, i.e., for all analyses of a string, (i) a generation-based competition is computed to determine which string analyses are grammatical, and (ii) the optimal analysis among the grammatical ones is chosen as the preferred reading. The other possible combination plays no role in this paper: one might generate all possible strings for a 18
15
If we nd empirical cases where the result of this bidirectional competition based on the same set of constraints coincides with the intuitively preferred reading, this shows an interesting generalization of OT. To bring this out we need to look at an empirical domain which involves a fair amount of realization alternatives and ambiguity. A good example is the relatively free word order in German, modelled within OT-LFG by Choi (1999). In the German Mittelfeld (the region between the nite verb in verb second position and the clause-nal verb position), nominal arguments of the verb can appear in any order. However, as has been widely observed (cf., e.g., Lenerz 1977; Höhle 1982; Abraham 1986; Uszkoreit 1987), a certain canonical order is less marked than others. Deviations from this canonical order are used to mark a special information structure (or topic-focus structure), i.e., these non-canonical orderings are more restricted through context. Sentence (19) reects the neutral order as it would be uttered in an out-of-the-blue context. Variant (20a) will be used to mark dem Spion as the focus, (20b) furthermore marks den Brief as the topic. (19) daÿ der Kurier dem Spion den Brief zustecken sollte that the courier (nom) the spy (dat) the letter (acc) slip should
(20) a. b.
daÿ der Kurier den Brief dem Spion zustecken sollte daÿ den Brief der Kurier dem Spion zustecken sollte
Choi (1996:150) models these data assuming competing sets of constraints on word order: the canonical constraints, based on a hierarchy of grammatical functions (and, in principle also a hierarchy of thematic roles) (21); and information structuring constraints (distinguishing the contextual dimenstions of novelty and prominence, each marked by a binary feature) (22). (21) canon Choi (1996:150) a. cn1: SUBJ should be structurally more prominent than (e.g. `c-command') non-SUBJ functions. b. cn2: Non-SUBJ functions align reversely with the c-structure according to the functional hierarchy. (SUBJ > D.OBJ > I.OBJ > OBL > ADJUNCT)
(22) Information Structuring Constraints: a. new: A [?New] element should precede a [+New] Element. b. prom: A [+Prom] element should precede a [?Prom] Element.
Choi (1996:150)
given input structure (e.g., 2 in (18)), then parse each of these strings ( 2 and 3 ) applying a preference ranking. Those strings, for which the preferred reading does not match the intended meaning, are discarded (in the example 2 , whose preferred analysis 2 has 1 as the underlying meaning). For the remaining generation alternatives of the original input, the most optimal one is determined in a generation-based competition. Note that although this nal step is very close to the standard OT competition, the outcome may be quite dierent: crucial competitors may have been ltered out of the candidate set by the intermediate backward competition for the preferred readings of a string (in the example, 3 would win rather than 2 ). This conception of language production may be quite useful for modelling the intuition that speakers tend to avoid misleading utterances, but at the current stage of research this is a highly speculative remark. The cases that have to be checked empirically are those where the standard unidirectional competition predicts an analysis to be optimal that is not the most harmonic analysis among all analyses over its underlying string. If such sentences are ungrammatical or less acceptable (according to gradedness judgements as discussed by F. Keller 1998), the bidirectional competition might be more adequate. i
s
s
c
s
i
s
16
s
Based on an appropriate ranking of these constraints (prom cn1 fnew, cn2g), Choi can predict the optimal ordering for a given underspecied f-structure (which in this case will also contain a description of the informational status of the verb arguments). When the arguments don't dier in informational status, the canonical constraints will take eect, leading to the order in (19); when there are dierences, the unmarked order will however violate information structuring constraints, such that competitors with a dierent ordering can win out. Like the Grimshaw/Bresnan fragment, Choi's assumptions about Gen can be formulated as an LFG grammar and (with some simplications) extended with identiers introduced to mark alternatives not satisfying violable constraints i.e., we can again assume an LFG grammar GGen ;marks . For sentence (19) and its ordering variants, bidirectional optimization doesn't give results that go beyond what can be reached with generation-based competition alone, since in parsing the NPs can be unambiguously mapped to argument positions. However, if we look at sentences with ambiguous case marking like (23) and (24), the situation changes. (23) daÿ Hans Maria den Brief zustecken sollte that H. (nom/dat/acc) M. (nom/dat/acc) the letter (acc) slip should
(24) daÿ Otto Maria Hans vorschlagen sollte that O. (nom/dat/acc) M. (n/d/a) H. (n/d/a) suggest should
Parsing (23) with the appropriate GGen ;marks -grammar will result in two classes of analyses: one with Hans as the subject, and Maria as the indirect object, and one with the opposite distribution. The latter reading is strongly preferred by speakers of German; however, there is no way of avoiding this ambiguity with hard constraints. Neither will the generation-based OT competition predict any dierence, since the two readings are not members of the same candidate set. For (24), even more readings become possible: any of the three NPs can ll any of the three available argument positions. Nevertheless, speakers clearly prefer one reading.20 If we apply the OT parsing scheme from g. 1, with the additional preference optimization among the grammatical alternatives, Choi's original constraints will predict exactly these observations. Since in this additional optimization the string is xed for all competing candidates, the analysis which violates the least constraints will be the one which interprets the arguments in such a way that the observed order is in line with the canonical order. Thus, for the constraints that Choi (1999) assumes, the standard OT generation-based view can be generalized to the parsing scenario if a bidirectional competition is applied.21 Note however that further factors are at work: in (i), which also contains ambiguous case marking, the selectional restrictions of the verb clearly overrule the ordering preferences the absurd reading of the opera composing Mozart doesn't occur to a speaker of German (neither does the sentence sound odd). This shows that a longer story needs to be told about the interaction of the various components. (i) daÿ diese Oper Mozart komponiert hat that this opera (nom/acc) M. (nom/acc) composed has 21 Under this extended view on possible competitions, the preference mechanism of (Frank et al. 1998) is just a special case of an OT grammar, in which only the competition in the parsing direction takes place. The determination of the grammatical candidate analyses is based on a classical grammar without OT competition. Thus, no bidirectional processing is required. 20
17
It is an open question what properties determine the usefulness of a constraint, or an entire constraint system, in a bidirectional competition.
3.6 The complexity problem
The computational model for OT presented in subsections 3.1-3.4 follows the denition of the notions involved quite closely. In particular, it realizes the specication of the candidate set entering the optimizing competition by actually constructing the analyses. Since the underlying LFG grammar is intentionally kept highly unrestricted (as most restriction is performed by the violable constraints), relatively simple sentences involve already a great number of candidate analyses. However, using the standard LFG architecture and the highly developed XLE system, the techniques of packed representations are exploited (cf. Maxwell/Kaplan 1989). This means that the set of analyses is never eectively enumerated, in particular since the Gen and the marks function are interleaved in the grammar specication. Constraint violations are marked as the corresponding structure is created, and thus locally restricted constraints will cause little interaction in the packed representation of the candidate set. Nevertheless is the complexity burden of constructing all conceivable candidate analyses enormous when an OT grammar is compared with a classical grammar covering a similar set of data. The crucial factor here is the (intentionally) high degree of ambiguity in the grammar modelling Gen.22 A particularly unfortunate circumstance from the viewpoint of complexity considerations is the need for parsing and backward generation from every analysis in the parsing task (section 3.4). Both subtasks work with the highly unrestricted underlying grammar, and unless the generation task could prot much from the result of the parsing task, one has to assume that the complexities multiply in rst approximation.23 Note however that for discarding a candidate as ungrammatical in the backward generation step, it suces to show that there is a more harmonic analysis with the same input but a dierent surface string, so a processing strategy might be to start out with the parsing analysis and systematically try to vary the parts of the structure that caused some constraint violation, proceeding from the most highly ranked constraint downwards. To classify an analysis as grammatical, however all competitors have to be generated. Already Kaplan/Bresnan (1982, 272) note in the discussion of the exponential processing complexity of LFG: For our formal system, this processing complexity is not the result of a lengthy search along erroneous paths of computation. Rather, it comes about only when the c-structure grammar assigns an exponential number of c-structures ambiguities to a string. To the extent that c-structure is a psychologically real level of representation, it seems plausible that ambiguities at that level will be associated with increased cognitive load. Whatever psychological reality one likes to assume for c-structure in classical LFG, it is quite obvious that the c-structure of the grammar formalizing Gen in OT-LFG covering all universally possible c-structures over a given string has no plausible reality in the processing of a single given language. 23 Johnson (1998, sec. 4.1) even points out that the universal parsing problem for OT-LFG might be undecidable with an unrestricted LFG grammar formalizing Gen, sketching a construction that allows to specify a grammar that encodes the steps of a Turing machine. The construction depends however on a candidate analysis that is structurally unrelated to the input, so the restriction (5) we made precludes this situation. Since parsing with an LFG grammar is decidable, and the generation from f-structures is decidable unless recursive f-structure may be added (cf. section 3.1 above) the combined task is also decidable. 22
18
An additional point worth mentioning is that with a bidirectional constraint system as discussed in 3.5, preferences among analyses based on the same string could be exploited in a control strategy for the backward generation step illustrated in g. 1: using the GGen ;marks grammar, the preference ranking of the parsing analyses is already clear prior to the generation task (with the generation-based decision about grammaticality pending). If the order of applying backward generation to the various analyses of the input string follows the preference order, and one is only interested in the most preferred grammatical analysis, the overall processing complexity is decreased considerably in the average case: one can stop with backward generation as soon as a single analysis has been conrmed to be optimal, and thus grammatical. A garden path eect will occur for cases where the most preferred analysis of the string turns out to be ungrammatical after backward generation.
4 Optimization as an oine procedure In the face of the potentially severe complexity problems of the more or less straightforward OT formalization presented in the previous section, the question arises if there is no alternative formalization that allows for more ecient processing. When constructing tableaux manually, there is a point where one stops adding candidates of a certain pattern, since it is intuitively clear that they will not have the slightest chance to win the competition. Likewise one would expect a processing system to know in advance about hopeless candidates. This intuition is only in part (through the exploitation of a packed representation of the candidate set) reected by the formalization from section 3. If more of it can be captured formally, this may be a key to an ecient OT model. The obvious problem is that unlike with hard constraints, in OT one cannot discard an analysis on the basis of a local constraint violation, since the analysis may still be the best of all possible ones due to more highly ranked constraints. Nevertheless, a striking property of OT systems in the literature is the relatively restricted structural domain to which the competition can be limited.24 Karttunen (1998), extending ideas by Frank/Satta (1998) and Hammond (1997), shows that a precompilation of the optimizing competition can be realized in the nite-state approach to computational phonology. I will review his approach briey in subsection 4.1. In subsection 4.2, I will address the question how a similar idea could possibly be included in an LFG architecture.
4.1 Karttunen's (1998) lenient composition
In a computational account of OT phonology, all the components dealing with linguistic structure the input, Gen, and the constraints can be captured in the nite-state calculus, i.e., as regular expressions characterizing nite-state transducers (Ellison 1994; Eisner 1997). For syllabication (Karttunen's (1998) example), the input is, e.g., a string of consonants and vowels, Gen is a transducer that adds structuring brackets ("O[" ... "]" for onset, "N[" ... "]" for nucleus and "D[" ... "]" for coda). Gen works in a fairly unconstrained manner,
For instance, Grimshaw (1997) assumes that the relevant candidate sets are alternative extended projections (cf. the quotation in fn. 4), and Sells (1998, fn. 2) makes explicit that his analysis hinges on a separate competition for each X0 item (and, one could add, the extended projection of the X0 ). 24
19
i.e., allowing both for unparsed input elements (marked by "X[" ... "]") and for overparsing (introducing an empty pair of brackets). Karttunen (1998) gives a precise denition and reports that the minimized transducer modelling Gen has 22 states and 229 arcs. For the word abracadabra it generates 1.7 million output candidates (whereas the network representing the mapping has 193 states which clearly shows the usefulness of applying nite-state techniques). (25) visualizes how one can think of the transducer formalizing Gen : in (a.), it is applied to the word in, in (b.) to bin. The upper and lower sides are regular languages, shown here as (incomplete) enumerations of the strings they contain. (25) a. b.
in GEN N[i]D[n], N[i]X[n], O[]N[i]D[n], O[]N[i]X[n], ... bin GEN O[b]N[i]D[n], O[b]N[i]X[n], X[b]N[i]D[n], X[b]N[i]X[n], ...
The constraints will disallow certain combinations (by characterizing the regular language whose strings exclude these combinations). (26) illustrates two such constraints (for which the dominance relation for English would be HaveOns FillOns).25 (26) Constraints as regular expressions HaveOns Syllables must have onsets: FillOns An onset position must be lled:
"N[" => "O[" (C) "]" _ $["O[" "]"]
The application of a strict inviolable constraint to a language is modelled by intersection. For example, we could intersect the lower language of the transducer in (25a) with the HaveOns constraint in (26). This would remove the strings lacking an onset (N[i]D[n], N[i]X[n] . . . ) from the language. The properties of regular languages allow to precompile the result of such an intersection for all cases covered by a transducer, by simply composing two transducers:26 the composition operation A .o. B is dened as the intersection of the lower language of A and the upper language of B, giving the intended ltering eect. If the constraints we are interested in were not violable, one could thus easily compose the transducers for Gen and all constraints to get a single transducer taking a given input in one step to the structures that satisfy all constraints. (27) illustrates this for the two constraints we discussed, applying the composed transducer to bin. is the restriction operator, saying that the expression on its left is only allowed in the context dened on its right, using the _ notation, as known from context-sensitive rule notation (double quotes are used to single out object language symbols where necessary). The $ symbol denotes string containment, is nite-state complementation, i.e., FillOns denotes the language of all strings not containing "O[" with a "]" following immediately. 26 A simple automaton can always be looked at as a transducer with identical upper and lower languages. This means that our constraints can also be interpreted as transducers. 25 =>
20
(27)
bin GEN .o. HaveOns .o. FillOns O[b]N[i]D[n], O[b]N[i]X[n], ...
Of course, this merciless composition does not meet the interpretation of the violable constraints in OT. Applying the transducer from (27) to in gives us the empty language as the lower side (see (28a)), since the rst constraint lters out all strings lacking an onset. The only way for in to have an onset is by overparsing, i.e., introducing an empty onset O[] (see the lower language of (28b), in which only Gen and HaveOns are composed). In (28a), the FillOns constraint will however lter out all such strings, leaving nothing in the lower language. (28)
a.
in GEN .o. HaveOns .o. FillOns NOT ACCEPTED
b.
in GEN .o. HaveOns O[]N[i]D[n], O[]N[i]X[n], ...
Karttunen (1998) shows however that an operation can be dened in the nite-state calculus that eectively allows the compilation of (Gen and) violable constraints into a single transducer, with the intended eect of modelling an optimizing competition. He calls this operation lenient composition. Lenient composition takes two transducers A and B and does one of the two following things: (i) if some of the strings characterized by A are compatible with B (while others are potentially not), then B is composed with A to lter out the incompatible strings (if any); (ii) if none of the strings characterized by A satisfy B, then B is ignored, and the result is identical with A.27 If we interpret A as the candidates in an OT competition (the rows in a tableaux) that have undergone ltering with respect to all constraints dominating a particular constraint Ci, and we interpret B as the formalization of this constraint Ci, then lenient composition evaluates the eect of the constraint on the set of candidates: case (i) if some candidates violate the constraint while others don't, the former candidates will be excluded from further consideration (recall the `!' marking a fatal constraint violation in the tableau notation (3)); 27 The denition relies on the nite-state version of priority union (Kaplan 1995). Lenient composition, denoted by .O. is dened formally as (i) R .O. C = [R .o. C] .P. R where .P. denotes priority union, which is in turn dened as follows (.u returns the upper language of a transducer, denotes complementation): (ii) Q .P. R = Q | [[Q.u] .o. R]
21
if on the other hand case (ii) all candidates denoted by A violate the constraint under consideration, they will all stay in the competition. By lenient composition we can form a cascade of Gen and all violable constraints, in the order of the dominance hierarchy. So, we will now get a non-empty lower language even when applying the composed transducer to in (cf. (28a)). The application of HaveOns is an instance of case (i), ltering out the candidates containing no onset, whereas the application of FillOns is an instance of case (ii): none of the remaining canidates have a non-empty onset, thus all of them survive this step. (29)
in GEN .O. HaveOns .O. FillOns O[]N[i]D[n], O[]N[i]X[n], ...
The complete OT mechanism can thus be eectively compiled into a single transducer taking an input to the optimal analysis (or analyses). The application of such a transducer in either direction can be performed in linear time (with respect to the length of the input string). What is particularly interesting about Karttunen's computational treatment is that it brings out the relation that the OT approach bears to the classical approaches of rewritesystems and two-level models. There are considerable dierences in the way generalizations are expressed (and hence, predictions dier about what in the grammars is universal, what needs to be acquired, and how one can think of this acquisition). But in terms of the computational means needed for processing a particular grammar, the dierent systems are very much alike they can all be captured by nite-state transducers.
Restrictions with multiple constraint violations. Karttunen (1998) notes a restric-
tion for a nite-state-based OT system: since automata cannot count (intuitively speaking), special formulations are required for checking a particular number of multiple constraint violations. This means that a given OT grammar can only distinguish between dierent numbers of multiple violations up to a certain limit. Multiple violations play an important role in theoretical OT work, but the restriction to an upper limit of distinguishable numbers of violations might not be a serious problem.
4.2 The application to LFG syntax
Karttunen's (1998) lenient composition approach works for any linguistic system that can be formalized in the nite-state calculus. It will not work however for context-free languages and beyond at least not in the straightforward way, since these classes are not closed under the relevant primitive operations that make up the nite-state calculus. But there are ways of nevertheless exploiting the idea in parts of more complex grammars, possibly avoiding the enormous complexity discussed in section 3.6. Johnson (1998, sec. 4.2) observes about (Bresnan 1998b) that the optimization involved seems to be strictly clause local. He goes on with the consideration: if there are only a nite number of clausal input 22
feature combinations and candidate clausal structures then it may be possible to precompute for each lexical item the range of input clauses for which it appears in the optimal candidate. Here, I present the sketch of an eective scheme of precompiling an OT competition into an LFG grammar with the assumption of certain restrictions about the expressiveness of constraints (in particular the size of the structures they apply to). The key idea is to factorize out a certain level of information the right-hand side of c-structure rules that can be described by regular expressions and that provides enough clues to precompile the competition for certain types of input. The eect of context-free rule recursion and the computation of f-structure is not anticipated, but is left for the later online application of parsing; all that the precompilation scheme has to ensure is that for each possible input type, the competition has been precomputed, producing a rule that will accept a superset of the optimal candidates. The extra analyses in this superset will all be ltered out by the later LFG parsing (or generation). The resulting grammar works like a classical grammar, i.e., it has the nice property of being reversible; and processing the grammar does not involve the construction of all universally possible candidate analyses.
4.2.1 Constraints on rules Some crucial properties of the nite-state-based approach that make a precompilation of the competition possible are (i) that a set of candidate analyses can be described by a single (regular) expression, and (ii) that two such set descriptions can be combined to form a new one, without having to compute the extension of the set. This allows for anticipating the successive application of all constraints in the order of ranking to the description of analyses (with the special operation of lenient composition): it will again give us the description of a set. When we actually apply the precompiled system to a particular input, all we need to do is check whether the input is in the set (or, in the relational case of transducers, which pairs in the set contain the input as the rst component). When we are dealing with a candidate set that constitutes a context-free or even contextsensitive language, we cannot expect to be able to do the same even if the constraints are regular: the denition of lenient composition involves complementation for the rst operand, and context-free languages are not closed under this operation.28 Another way of precompiling an OT competition is however possible with context-free grammars: assume we have a fairly unconstrained base system of context-free rules, and constraints regulate the form of local trees further. For standard context-free rules with a sequence of nonterminal and terminal symbols as right-hand sides, this set-up does not give us much freedom; but for the extension to regular expressions on the right-hand side, as they have been assumed in LFG from the beginning, we get quite a reasonably expressive system. The simple example (30) may give a avour of how such a system might look like. A fairly unconstrained base version of the rules (a.) is combined with additional constraints (b.).
Ordinary, merciless composition would be possible since the intersection of a context-free language and a regular language is known to be a context-free language this demonstrates a signicant dierence between ordinary and lenient composition: for the former, it is intuitively enough to look at individual analyses one at a time, while the latter involves looking at the alternative analyses also. 28
23
(30) a. b.
Rules VP ?! { NP | VP | N0 | V0 }* NP ?! { NP | VP | N0 | V0 }* Rule constraints i. The right-hand side contains a head of the same lexical class as the left-hand side ii. The right-hand side contains at most one X0
It should be obvious that with a limitation to individual rules, lenient composition of the original rule with the constraints becomes formally possible, since all are regular languages. However, as it stands this competition among right-hand sides of rules seems only remotely related to OT syntax, which we are trying to formalize. The limitation to local trees (or single rules) appears to allow the formulation of little more than some basic X-bar principles; furthermore it is not clear how dierences in the candidate sets (or their relation to the input) are captured. These issues will be addressed in turn in the next two subsections.
4.2.2 Is the scheme expressive enough? The limitation to local trees essential for the lenient composition system to work allows to express more than it might rst seem. The key is the exploitation of the regular means of expression like the Kleene star, optionality etc.: there is no reason why the concept of a local tree in the technically motivated context-free structure (which ensures that this OT formalization works) has to coincide with the theoretical concept of a c-structure tree, as long as all theoretically relevant properties of the assumed structure are captured. In principle, arbitrarily large portions of structure that do not involve recursion could be expressed in a single regular right-hand side of a context-free rule. A regular language lacks hierarchical structure, but many relevant structural concepts can be highlighted using marker symbols like the brackets in phonological structure. The resulting construct is even more informative when the structure to be modelled is a binary, right-branching29 structure: precedence in the regular representation models the concept of c-command of the hierarchical representation.30 The abstract structure in (31a) can for example be modelled by the string in (31b).31 Using additional marker symbols for precedence in binary trees, generalization to binary branching structure with arbitrary distribution of the branching daughter becomes possible (keeping up the limitation to a single branching daughter). 30 These ideas of restructuring should be seen in the following light: in a non-transformational theory like LFG, in particular in its OT guise, many arguments for directly assuming a certain hierarchical structuring lose validity: constituency of dislocated material is not necessarily a sign that the same amount of material in a non-dislocated paraphrase forms a constituent as well. An issue that future work has to address in this context is coordination. The system doesn't allow for classical constituent coordination of the categories on the spine, like E in (31) for instance. Given the various phenomena subsumed under non-constituent coordination, I would tend to follow the analysis proposed by Maxwell/Manning (1997) for coordination in general. This means that there will be no c-structure rules for coordination, but the relevant representations can be created on the y by the parsing system on the basis of the given rules. 31 We will not use closing brackets, since their balancing cannot be guaranteed in a regular language note however that due to the right-branching structure, we don't lose any information this way. 29
24
(31)
a.
B
A D
b.
C
E
F
G
H
[A B [C D [E F [G H I
I
Of course, not the complete syntactic structure is strictly right-branching (else, we would be dealing with regular languages): certain nodes in such a skeleton (say, B, F and I in (31a)) will again dominate similar right-branching skeletons. But what is important for our purposes is that syntactic research has come up with units in the overall structure, for which such restricted structural variance has an explanatory status. We can thus encode an entire extended projection, i.e., the projection of a lexical category including all functional projections (see Grimshaw 1991; Grimshaw 1997 and Bresnan 1998a for an LFG-based variant), in a single LFG rule. As addressed in section 3.3 above, Grimshaw (1997) assumes that the underlying principles (i.e., Gen ) allow arbitrarily many functional projections in an extended projection. Unnecessary projections will be ltered out later, as they violate OT constraints. Even this conception of arbitrarily many functional projections can be captured straightforwardly, with a Kleene star (actually, the potential for arbitrarily many functional projections is simpler to express than listing, say, three particular projections this may suggest that we are on the right track).32 Likewise, adjunction could be captured with a Kleene star construct. In (33a), the standard tree notation for extended projections (32)33 is reected in regular notation;34 (33b) shows the same in the notation of the Xerox nite-state calculus. (32)
Fn P
XP
Fn
F0n
.. .
F1 P
XP
F1
b.
4
XP
3
2
(33) a.
F01
[FP XP [F F 0
5
LP L
L0
XP
[LP XP [L L XP 0
[ "FP[" XP "F'[" F ]* "LP[" XP "L'[" L XP
To model the OT grammar assumed by Grimshaw (1997) (or for that matter Bresnan's (1998c) reconstruction thereof), we further have to make all categories optional. For easy reference to the unlled slots in constraints, we will use markers (an `e'). Note that the For a regular-language-based compilation of an OT competition, an innite set of candidates is no problem because just the nite representations are handled. 33 L stands for a lexical category, F for a functional category. 34 The smaller square left brackets with subscripts are object-language markers, while the larger square brackets express meta-language grouping. 32
i
25
context-free grammar will ignore these, just like the structure markers `[FP ' etc. All XPs and heads will be optionally introduced by disjunctions { XP | e } or { F | e } etc. (or in nite-state calculus:35 [ XP | e ] and [ F | e ]). Now it is straightforward how constraints referring to categorial structure can be expressed. Let us go through a few of them in Grimshaw's formulation, cf. (1). The constraint Stay is easiest to express. It is modelled by the language containing no e (34). (34)
Trace is not allowed $e But also constraints like Ob-Hd can be stated fairly easily (here, we assume that to satisfy Stay
this constraint a projection must either have an overt head itself, or there must be an overt head in a c-commanding position): if there is an e in a head position, there must be a non-empty head position further left in the structure (35).36 (35) Ob-Hd A projection has a head [ [ "F'[" | "L'[" ] e ] => "F'[" ne ?* _ So far, the model allows reference only to categorial information. This is certainly too restrictive since many constraints mention further features. With regular languages we cannot express unication constraints of course, but reference to atomic values for a nite set of features is of course possible, if we extend the vocabulary or the notation. If we take XPop to represent an XP that is marked as an operator we can thus also formulate Op-Spec: (36). Adding the restriction that the operators should c-command the entire extended projection, we get (37) (where .#. is a boundary marker marking the beginning of the string). (36) (37)
Op-Spec
Syntactic operators must be in specier position
XPop => [ "FP[" | "LP[" ] _ XPop => .#. [ "FP[" | "LP[" ] _
The LFG rules must also contain the appropriate f-annotations (e.g., ("obj)=#) to construct the correct f-structure. These will include functional uncertainty equations describing arbitrarily long feature paths, and providing considerable descriptive power. A grammar will however contain only a nite number of such f-annotations; we can thus postpone all eects of such annotations until the online application of the parser, like we already do with the recursive aspects of context-free rules. The annotations themselves may certainly be relevant in the formulation of constraints, and we can again assume either an extension of the vocabulary or some special notation that accepts f-annotations as part of the strings we are dealing with. In the present sketch, we will not go into this.
4.2.3 Taking the input and the lexicon into account
We have seen that with the means of regular expressions, constraints local to an extended projection can be eectively formulated. It is not yet clear however in which way the system is sensitive to a given input, in particular when the regular expressions are eectively used as
I will adopt the nite-state calculus in the following, since it provides many useful predened operations; unfortunately, the notation is slightly dierent from the regular notation standardly used in LFG rules. 36 The constraint should be read as follows: an e after either "F'[" or "L'[" is restricted to the context where to its left separated by arbitrarily many symbols (?*) there is the sequence "F'[" followed by some symbol other than e. 35
26
the right-hand sides of context-free rules. To make the problem more transparent let us look at the system behaviour we would get if we ignored the role of the input (and the lexicon). Suppose we take a general expression of structurally possible extended projections like (38) as the basis for a lenient cascade of the constraints. (38)
[ "FP[" [ XP | XPop | e ] "F'[" [ F | e ] ]* "LP[" [ XP | XPop | e ] "L'[" [ L | e ] [ XP | XPop | e ]
This expression will generate all kinds of skeleton structures, with all combinations of lled and unlled positions, and of operator-XPs and ordinary XPs. Let us ignore the constraint cascade for a moment and assume that we use expression (38) directly as the right-hand side of a rule. When we add LFG f-annotations and apply the rules to actual lexicon entries, the abundance of possibilities will be severely restricted by the subcategorization information of the lexical elements involved, which is enforced by the LFG principles of functional uniqueness, completeness and coherence. The LFG division between c-structure and f-structure principles thus allows to keep the individual formulations general, without losing eectiveness in the resulting interplay. So it is no problem for the ultimate result if the regular expression generates innitely many, often absurd sequences (e.g., containing no heads, or 200 XPs). Hence, when applying the lenient cascade of constraints, we need not worry if the cstructure rules allow for too many possibilities. This is not a problem since the rules will be used in an LFG grammar, and the f-structure based principles will apply. However, the opposite situation may cause a problem: through application of the constraints, all possible c-structures required to express a certain f-structure in a complete and coherent way may have been ltered out since there were cheaper competing c-structures. These remaining cheaper c-structures do not provide the right kind of slots to express the f-structure. As it stands, the lenient cascade is blind to later f-structure requirements. Take for example the constraints we mentioned in the previous subsection, and assume that Stay, disallowing for the presence of e, is at work (no matter how highly ranked). Since there is no other constraint favouring e, all candidates containing an e will simply be ltered out. Now, if the satisfaction of another constraint like Op-Spec motivated the use of some extra functional projection, all available rule variants will have lled slots in all lower projections. For instance we will get (39a) but not (39b). (39) a. b.
"FP[" XPop "F'[" F "LP[" XP "L'[" L XP "FP[" XPop "F'[" F "LP[" XP "L'[" L e
Ignoring irrelevant details about the subject position, the latter could have been the right structure to express Who will she see, and the extra XP in the structure we actually get will be inappropriate due to functional uniqueness, given a transitive verb. So the resulting system will not allow us to express a question with a wh-object. The reason is quite obviously that we ignored the input, and thus the candidate set was too large. The problem is however that the exact specication of the relation between input and full analysis would require full power of a formalism like LFG (which we are making an eort to avoid at this stage). The strategy that I propose as a solution follows the general idea: if the violable constraints can be expressed local to a single LFG rule (an assumption that was argued for in 27
the previous subsection), one can apply them ahead of the functional principles. What we have to do with respect to the input is ensure that the competition is precompiled for types of candidate sets that are restrictive enough to exclude such an unfair competition as just observed. Thus, in the oine competition we deal with a superset of the real OT candidates, and we leave the application of further (inviolable) principles for the online processing. Technically, this can be achieved in the denition of the upper side of the initial transducer in the lenient cascade. This upper side has to reect the relevant feature distinctions in all possible combinations. This can be easily done by specifying a very general regular expression listing all individual choices as optional.37 Another factor that inuences the outcome in a competition is the availability of lexicon entries and embedded phrases (and their potential to realize certain features); thus we encode also for these the choice of possibilities in the string on the upper side. One can think of the specication as a regular expression roughly like (40) (specifying an unrestricted inventory of possible feature/realization pairings, without commiting to a particular structuring at this point).38 (40)
([ "(SUBJ OP)=-" | "(SUBJ OP)=Q" ]) [ XP | XPop ]* ([ "(OBJ OP)=-" | "(OBJ OP)=Q" ]) [ XP | XPop ]* [ "(TNS)=FUT" | "(TNS)=PRES" | "(TNS)=PAST" ] [ AUXfut | AUXpast | AUXpres ]* [ Vfut | Vpast | Vpres ]*
Faithfulness constraints will ensure (when not violated against) that the lower side of the transducer is in keeping with the input information specied on the upper side. Some relations may also be enforced as inviolable. For the example discussed under (39), we have as one of the possible input congurations a verb with two argument slots, combined with exactly one argument XPs for each slot (and thus (39a) will either be no competitor for (39b) or it will violate highly ranked faithfulness constraints). Of course, restricting a verb with two argument slots to the combination with two argument XPs would mean the anticipation of completeness and coherence, which we wanted to avoid. For this reason there are also (abstract) input congurations for a verb with two argument slots and with more or less than two argument XPs in the projection. For each option, the lenient cascade precompiles (implicitly) a separate OT competition. A last point we have to ensure is that the resulting optimal right-hand sides will be applied only in the appropriate situation i.e., when the actual input in the online computation matches the abstract input classes that underlied the oine precompilation. This is done by introducing special f-annotations to the rule, which enforce the underlying feature distinctions (basically, we have to copy the feature equations in (40) from the upper side of the transducer to the lower side). Some of the resulting optimal rule candidates may be unusable, given the f-structure restrictions. This reects the fact that the competition among candidate rules does not model the whole of the abstract Gen in OT syntax. In the abstract denition only candidates are generated that satisfy the underlying f-structure principles. But in order to exploit Note that the oine computation of the lenient cascade might however become expensive if the underlying regular expression gets very large. 38 For illustrative purposes, the feature specications are given here as atoms of the regular language (enclosed in double quotes). Within the quotes, XLE notation is used, with replacing the ". 37
28
precompilation, we can factorize out a certain dimension of the Gen restrictions which we can use in a lenient cascade with all constraints (provided they can all be formulated as regular expressions, and relative to a non-recursive phrase structure domain). Applying the f-structure principles only to the result of the OT cascade has the same net eect as the highly complex online competition that follows the serialization in the abstract denition.
4.3 The relation to classical LFG
What is the status of the transducer modelling lenient composition in relation to the LFG grammar? We already said that the lower side is intended to model the right-hand side of a rule. But what about the upper side? The role of the input in the lenient cascade for syntax is fairly abstract. All it does is provide a classication of possible inputs which controls the competition. Hence the upper language of the transducer implementing the cascaded OT competition has purely internal relevance. All interface information is represented in the lower language, which contains encodings of (i) input types (expressible as LFG f-annotations) and (ii) an optimal rule based on that input type. The lower language of the OT transducer can be represented by a single nite-state automaton. This automaton provides the entire relevant information. Recall that we can interpret the right-hand side of a classical LFG rule as a nite-state automaton. Using the special automaton constructed through the lenient cascade instead of an automaton representing a fairly simple manually specied regular expression makes no formal dierence. Similarly as Karttunen's (1998) result for OT phonology, this shows that under the locality assumptions the dierence between an OT model of LFG and the classical model lies mainly in the way generalizations are expressed (and thus the explanatory dimension), not so much in the requirements of the underlying computational model. This gives a natural explanation for the initially observed convergence between, on the one hand, OT-related work in the LFG literature and, on the other hand, recent developments in the classical tradition of LFG, attempting to provide a general, but restricted theory for central areas like the mapping from c-structure to f-structure (Bresnan 1998a, ch. 6-7). The principle of Economy of Expression assumed in the latter strain rests on the possibility of comparing analyses and could be formalized along the lines of the analysis presented in this section. Being formally a classical LFG grammar, an OT grammar following this formalization can certainly be used not just for generation from an input f-structure, but also in the parsing direction. Since the OT competition has been precompiled under a generation perspective, and is implicitly encoded in the rules, ordinary parsing will solve the universal parsing problem for OT-LFG ((16) in section 3.4). This means that the OT grammar will accept only grammatical analyses, and thus avoids both of the potentially severe complexity problems we noted for the system in section 3: (i) the eective generation of all candidate analyses, and (ii) the need for backward generation from each parsing analysis to solve the universal parsing problem. Of course the system inherits the formal restriction with respect to multiple constraint violations from Karttunen's (1998) OT phonology account (as discussed at the end of section 4.1). This means that one has to specify at compile time an upper limit for the number of constraint violations that should be considered. 29
5 Conclusion This paper presented two ways of formalizing OT in the LFG framework. The rst one, discussed in section 3, is close to the model that Bresnan (1998c) assumes, using a classical LFG grammar to generate the set of candidate analyses. The identication of constraint violations is folded into the same grammar, such that external to the grammar, just an evaluation component is needed that computes the optimal candidate based on the constraint violations and the language-specic constraint hierarchy. The model involves an eective generation of all candidates (or a packed representation of all candidates). This may pose a complexity problem for non-trivial fragments, in particular since the underlying grammar is by denition highly unrestricted and will thus produce an enormous number of analyses. For solving the parsing problem, even an additional backward generation has to be performed for each parsing analysis. There may however be ways of optimizing the backward generation problem, in particular when the constraint system allows an extension to bidirectional competition. The other formalization, discussed in section 4, uses a precompilation technique for the OT competition, adapting the OT phonology system by Karttunen (1998) to LFG. It factorizes the task of generating candidate analyses into (i) a step that is local to the extended projections of (Grimshaw 1991; Grimshaw 1997; Bresnan 1998a), and (ii) the remaining task of completing an LFG analysis, with the context-free analysis and the f-structure computation. If the OT constraints can be expressed by regular expressions local to extended projections (which seems to be the case for systems like the one in Grimshaw 1997) the competition can be eectively precompiled. The resulting grammar is formally a classical LFG grammar. While the sketch in the present paper shows the principled feasibility of an oine competition in syntax, more work on the details is certainly required. The second formalization provides no access to suboptimal analyses. This is not a limitation if optimality is to model grammaticality as in standard OT. But in an optimality-aspreference model, the situation may be dierent: under certain circumstances, later processing steps may have to get back to the set of analyses and consider some of the suboptimal analyses. A possible set-up that could accomodate for this without losing the favourable complexity properties of the second formalization would be the following: a precompiled cascade is generally used for the rst pass in processing a sentence, and the system falls back to the rst technique (or alternatively, a more generous cascade) only in the case of failure. This strategy would cause the garden path eect known from human sentence processing for phenomena not accomodated for in the grammar underlying the rst pass.
References
Abraham, W. (1986). Word order in the middle eld of the German sentence. In W. Abraham and S. de Meij (Eds.), Topic, Focus, and Congurationality, pp. 1538. Bresnan, J. (1996). LFG in an OT setting: Modelling competition and economy. In M. Butt and T. H. King (Eds.), Proceedings of the First LFG Conference, CSLI Proceedings Online. Bresnan, J. (1998a). Lexical-Functional Syntax. Book manuscript, Summer 1998, ch. 6-8 in reader for ESSLLI'98 course: Bresnan/Sadler, Modelling Dynamic Interactions between Morphology and Syntax. Saarbrücken.
30
Bresnan, J. (1998b). The lexicon in optimality theory. In Proceedings of the 11th Annual CUNY Conference on Human Sentence Processing, Rutgers University. To appear. Bresnan, J. (1998c). Optimal syntax. In J. Dekkers, F. van der Leeuw, and J. van de Weijer (Eds.), Optimality Theory: Phonology, Syntax, and Acquisition. Oxford University Press. To appear. Butt, M., T. King, M.-E. Niño, and F. Segond (to appear). A Grammar-Writer's Cookbook. CSLI Publications. Butt, M., M.-E. Niño, and F. Segond (1996). Multilingual processing of auxiliaries within LFG. In Proceedings of KONVENS 1996, Bielefeld. Choi, H.-W. (1999). Optimizing Structure in Context: Scrambling and Information Structure. CSLI Publications. Dalrymple, M. (Ed.) (1999). Semantics and Syntax in Lexical Functional Grammar The Resource Logic Approach. MIT Press. Dalrymple, M. and R. M. Kaplan (1997). A set-based approach to feature resolution. In M. Butt and T. H. King (Eds.), Proceedings of LFG-97, CSLI Proceedings Online, University of California, San Diego. Dalrymple, M., R. M. Kaplan, J. T. Maxwell, and A. Zaenen (Eds.) (1995). Formal Issues in Lexical-Functional Grammar. Stanford, CA: CSLI Publications. Dymetman, M. (1991). Inherently reversible grammars, logic programming and computability. In Proceedings of the ACL Workshop: Reversible Grammar in Natural Language Processing, Berkeley, pp. 2030. Eisner, J. (1997). Ecient generation in primitive optimality theory. In Proceedings of the ACL 1997, Madrid. (ROA-206-0797). Ellison, M. (1994). Phonological derivation in optimality theory. In COLING 1994, Kyoto, Japan, pp. 10071013. (ROA-75-0000). Frank and Satta (1998). Optimality theory and the generative complexity of constraint violation. to appear in Computational Linguistics . (ROA-228-1197). Frank, A., T. H. King, J. Kuhn, and J. Maxwell (1998). Optimality theory style constraint ranking in large-scale LFG grammars. In M. Butt and T. H. King (Eds.), Proceedings of the Third LFG Conference, CSLI Proceedings Online. Grimshaw, J. (1991). Extended projection. Unpublished Manuscript, Brandeis University. Grimshaw, J. (1997). Projection, heads, and optimality. Linguistic Inquiry 28 (3), 373422. Grimshaw, J. (to appear). The best clitic: constraint conict in morphsyntax. In L. Haegeman (Ed.), Handbook of Contemporary Syntactic Theory. Dordrecht: Kluwer. Hammond, M. (1997). Parsing syllables: modeling ot computationally. (ROA-222-1097). Höhle, T. (1982). Explikation für `Normale Betonung' und `Normale Wortstellung'. In W. Abraham (Ed.), Satzglieder im Deutschen. Tübingen: Gunther Narr Verlag. Johnson, M. (1998). Optimality-theoretic lexical functional grammar. In Proceedings of the 11th Annual CUNY Conference on Human Sentence Processing, Rutgers University. To appear. Johnson, M. (1999). Type-driven semantic interpretation and feature dependencies in R-LFG. In Dalrymple (1999). Kaplan, R. M. (1995). Three seductions of computational psycholinguistics. In Dalrymple/Kaplan/Maxwell/Zaenen (1995). Kaplan, R. M. and J. W. Bresnan (1982). Lexical-functional grammar: a formal system for grammatical representation. In J. W. Bresnan (Ed.), The Mental Representation of Grammatical Relations, Chapter 4, pp. 173281. Cambridge, MA: MIT Press. Kaplan, R. M. and J. T. Maxwell (1988). An algorithm for functional uncertainty. In Proceedings
31
of COLING-88, Budapest, pp. 297302. Reprinted in Dalrymple et al. (1995), pp. 177-197. Karttunen, L. (1998). The proper treatment of optimality in computational phonology. In Proceedings of the International Workshop on Finite-State Methods in Natural Language Processing, FSMNLP'98, pp. 112. (ROA-258-0498). Keller, B. (1993). Feature Logics, Innitary Descriptions and Grammar. CSLI lecture notes, no. 44. Stanford, CA: CSLI. Keller, F. (1998). Gradient grammaticality as an eect of selective constraint re-ranking. In M. C. Gruber, D. Higgins, K. Olson, and T. Wysocki (Eds.), Papers from the 34th Annual Meeting of the Chicago Linguistic Society, Volume 2: The Panels, Chicago. Lenerz, J. (1977). Zur Abfolge nominaler Satzglieder im Deutschen. Tübingen: Gunther Narr Verlag. Maxwell, J. and R. Kaplan (1989). An overview of disjunctive constraint satisfaction. In Proceedings of the International Workshop on Parsing Technologies, Pittsburgh, PA. Maxwell, J. and C. Manning (1997). A theory of non-constituent coordination based on nite-state rules. In M. Butt and T. H. King (Eds.), On-line Proceedings of the First LFG Conference, Grenoble. Prince, A. and P. Smolensky (1993). Optimality theory: Constraint interaction in generative grammar. Technical Report Technical Report 2, Rutgers University Center for Cognitive Science. Sells, P. (1998). Optimality and economy of expression in Japanese and Korean. Number 7 in Japanese/Korean Linguistics, pp. 499514. CSLI, Stanford Linguistics Association. Smolensky (1996). On the comprehension/production dilemma in child language. (ROA-118-0000). Tesar, B. B. and P. Smolensky (1998). Learnability in optimality theory. Linguistic Inquiry 29 (2), 229268. Uszkoreit, H. (1987). Word Order and Constituent Structure in German. Number 8 in CSLI Lecture Notes. Stanford, CA: CSLI. Wedekind, J. (1995). Some remarks on the decidability of the generation problem in LFG- and PATR-style unication grammars. In Proceedings of the 7th EACL Conference, Dublin, pp. 4552. Wedekind, J. (to appear). Semantic-driven generation with LFG- and PATR-style grammars. Computational Linguistics .
32