A processing account of weak crossover Carl Alphoncey University of British Columbia
Georgopoulos (1991) claims there is a linearity component to weak crossover in Palauan. I propose a processing-based account which factors out the linearity component from Georgopoulos’ explanation and attributes it to the inherent linearity of processing. I present a computationally motived chain-building algorithm, embedded within a psycholinguistically plausible parsing model, and show how it offers an account for weak crossover, considering data from English and Palauan. I proceed to speculate that an algorithm for functional determination of empty elements based on their licensing requirements may offer an account of weak crossover facts in Hebrew.
1. Introduction In this paper I will present a chain-building algorithm and show how it offers an account of the weak crossover phenomenon. The chain building algorithm is independently motivated by computational considerations, and is embedded in a psycholinguistically motivated parsing model. The paper proceeds as follows: I first present arguments for the reasonably uncontroversial position that there is a distinction to be made between linguistic competence and performance. I then outline and motivate my assumptions regarding structure-building in general and chain-building in particular. I argue that the existence of so-called garden path utterances is evidence that parsing does not proceed in a massively parallel fashion. Ranked-parallel models and serial models are both conceivable. I assume a serial model on the basis that it is the more constrained model; if language processing cannot be accounted for using such a model, we will have made progress. With respect to chain-building, I motivate my assumption that that empty categories are postulated during on-line processing, and that the presence or absence of an identified filler influences the parser’s decision about what empty category, if any, to postulate.
I would like to thank Henry Davis, Hamida Demirdache, and Michael Rochemont for valuable discussion and the NWLC reviewers for insightful criticisms. All shortcomings are attributable to me. y Address: Department of Computer Science, 2366 Main Mall, Vancouver, Canada, V6T 1Z4. E-mail:
[email protected]
I present my chain-building algorithm, and show how it accounts for weak crossover data from both English and Palauan. Georgopoulos (1991) claims that there is a linearity component to weak crossover in Palauan. If her analysis is correct, then my account is to be preferred over standard accounts, as it attributes the linearity component to the left-toright nature of the parsing process. If, however, the linearity effect is a red herring, brought about by scrambling, then my account simply offers another way of looking at the problem of weak crossover. I propose a speculative extension to my chain-building model which broadens its empirical coverage in an interesting direction. I argue that serial processing supports a view of empty category identification based on a form of functional determination, rather than intrinsic feature specification or free assignment of features. This serves as the basis for the extended chain-building algorithm. The enriched algorithm incorporates two different mechanisms for establishing long-range dependencies: (unselective) binding and A-chain formation (see discussion of Tsai (1994)). Returning finally to weak crossover, I show how this might account for weak crossover facts in Hebrew. 2. Performance 6= competence Processing natural language is something people do very easily, but theoretical analysis suggests that it should be very difficult. Ristad (1990) argues that the recognition problem1 for human natural language is NP-complete. This indicates that there is no deterministic polynomial time algorithm known to solve this problem: indeed, all known deterministic algorithms require at least time exponential in the length of the input string. 2 The parsing problem3 is at least as hard as the recognition problem, and yet people parse natural language input with little effort, apparently in linear time. How can we reconcile these seemingly conflicting results? We might deny Ristad’s theoretical result and claim that if we just had a deeper understanding of natural language then its grammar would reveal itself to be computationally tractable. We might claim that people do not use grammar while processing, but rely solely 1
Stated in general terms, this is the problem of determining whether a string of symbols drawn from an alphabet is or is not a member of a given language defined over . 2 It is an open question whether the class of deterministic polynomial time solvable problems (P) is coextant with the non-deterministic polynomial time solvable problems (NP), though it is generally thought that P6=NP (see Garey and Johnson (1979)). 3 This is the problem of not only deciding membership of a string in a language, but also of determining the structural analyses assigned to strings in the language by the grammar of the language.
on simple processing heuristics. This would allow structure to be built in a simple manner, but is at odds with what we know about language processing. The simple heuristics would have to reconstruct large parts of the grammar. I will instead follow a path which seeks to bridge the gap between the reality of performance and the complexity of competence with a parsing model which explicitly makes use of the competence grammar, but which constrains the problem enough to make it tractable (see Marcus (1980), Berwick and Weinberg (1984), Pritchett (1992), Frank (1992), and Gorrell (1995), among many others).
3. Serial processing One dimension along which processing models vary is whether they are serial, pursuing one analysis at a time, or parallel, pursuing more than one analysis at a time. Certainly unconstrained parallelism is not realistic as a model of human linguistic processing, since an utterance can be infinitely ambiguous. Consider the following example from Japanese (from Mazuka (1991)).4 (1)
a. John ga Mary o ... John N Mary A ... b. John ga [e Mary o mita] otoko o ... John N [ Mary A saw] man A ... c. John ga [e [e Mary o mita] otoko o yobidasita] onna ga ... John N [ [ Mary A saw] man A called] woman N ...
Mary o may be embedded to an arbitrary depth; the depth of embedding remains unknown until later in the utterance. But even a constrained parallel model is problematic given the existence of garden path utterances. (2) The boat floated sank. If multiple analyses were pursued, the difficulty encountered in examples such as this would be unexpected. Ranked-parallel models, in which only those analyses which are within a certain distance of the “best” analysis will be pursued, can account for garden paths, but are difficult to distinguish from serial models. I will assume that the human parser operates in a 4
N = Nominative, A = Accusative.
serial manner, pursuing only one analysis at a time, for two reasons. The first is that human short-term memory is severely limited, casting some doubt on parallel models, which are more memory-demanding than serial models, all else being equal. The second is that a serial model is more constrained. From a purely theoretical point of view, until we have evidence which forces a choice between a serial model and a ranked parallel model, the serial model is to be preferred.
4. Fillers and gaps Bever and McElree (1988) and MacDonald (1989)show that gaps reactivate their antecedents, demonstrating that gaps are identified during on-line processing. Reporting similar findings for Japanese, Nakayama (1990) shows that antecedent reactivation is not a property unique to English. Crain and Fodor (1985) report on what is known as the “filled-gap effect”. They found that object noun phrases in the scope of an unresolved wh-dependency take longer to process than they do outside the scope of the wh-dependency (as in a declarative sentence). Stowe (1986) replicated Crain and Fodor’s finding, extending it to show that show the filled-gap effect obtains after prepositions but not in subject position. The significance of the filled-gap effect is that it shows that the presence of a filler affects how processing proceeds. The lack of filled-gap effect with subject position in English is expected under a filler-driven approach, because the lexical NP is processed before the subject position is identified. Frazier and Flores D’Arcais (1989) give evidence from Dutch which supports filler-driven processing. The upshot is that once a filler has been identified, the parser operates under a set of assumptions regarding what might fill empty positions which is different from the one it employs when no such filler has been identified. Thus, processing context determines behaviour.
5. Chain-building algorithm It is natural to assume that chain-building makes demands of the parser above those needed for normal structure building. The filled-gap effect indicates that the parser tries to terminate chain-building as soon as possible. When building a chain C and the next element to be processed is the element X, the parser needs to decide (i) whether X should be incorporated into C or not, and (ii) what index X should bear (in particular, should X be coindexed or contraindexed with C )?
1. Initialization When construction of a new chain begins ... (a) if it is an operator-variable chain, assign a new index (one that has not yet appeared in the current structure) to the chain, (b) otherwise, the index assigned to the chain need not be new 2. Chain-building
While a chain C is being constructed, and the next element to be processed is X ... (a) if the head of C c-commands X and if the tail of C fails to c-command X, then X is a candidate for chain membership:5 i. if a candidate is compatible with the chain (e.g. the candidate is of the correct category, and is compatible with the licensing specification of the chain) then coindex X with C (incorporating X into C ), ii. otherwise contraindex X and C . (b) If X is not a candidate, index X freely (X can thus bear the index of the chain, but will not be considered to be a member of the chain).
3. Otherwise If a chain is not being constructed, an element X obviously will not be considered to be a candidate for membership in a chain, and so can be indexed freely. Figure 1: The chain construction algorithm If X is incorporated into the chain, then X must be coindexed with the chain. I propose that the parser also adopts the following rule: If X is not incorporated into the chain, then X must be contraindexed with the chain. This means that, while constructing a chain, the parser does not consider the questions of chain membership and that of index independently. The proposed chain-building algorithm appears in Figure 1. 5
This definition of candidacy is perhaps unintuitive. An alternate definition is that X is a candidate if the head of C c-commands X and X c-command the tail of C . There are empirical differences between the two definitions, however. The difference becomes important in those cases where neither X nor the tail of C ccommands the other. For example, under this alternate definition a parasitic gap in a subject (as in “Who did my staring at bother?”) is not considered a candidate, whereas with the definition in the main text such
5.5. Rightward chain construction The elements of an utterance are perceived in a certain order. This linearity of processing (often referred to as left-to-right processing) combined with a filler-driven recovery of filler-gap dependencies, implies that LF movements will involve chain-building from the foot site to the head site (rather than from the head site to the foot site); since processing is left-to-right, chain construction will take place while material to the right of the foot site is being processed. In Figure 2 two chain constructions are represented. The chain represented by the solid line from A to D shows how the parser would recover a dependency formed through overt movement; the one represented by the dashed line from C to E shows how the parser would establish a dependency formed via LF movement. Note that this assumes a certain flexibility in phrase structure for positions which are the landing sites of LF movements, as these positions must be able to appear on the right hand side of the projection.6 Davis and Alphonce (1992) and Alphonce and Davis (Forthcoming) discuss the interaction of linearity of processing and chain construction in more detail.
6. English and Palauan data In this section I present weak crossover data from English, and review how a variety of approaches account for the data. I then present weak crossover data from Palauan and discuss the difficulties posed by these data. In the next section I show how the given chainbuilding algorithm accounts for both the English and the Palauan data. (3)
a. b. c. d.
* Whoi do heri neighbors respect ti ? Whoi ti respects heri neighbors? * Heri neighbors respect everyonei . Everyonei respects heri neighbors
The indicated coindexations are allowed in (3b) and (3d), disallowed in (3a) and (3c). Many accounts of weak crossover have been explored in the literature, among them the Bijection a parasitic gap is considered a candidate. Moreover, under the alternate characterization of chain candidacy just proposed, the pronoun in a weak crossover construction is not considered a candidate, which leads to the undesirable consequence that it can be indexed freely by the chain-building algorithm, possibly resulting in accidental binding of the pronoun by the operator. 6 This is strictly speaking not necessary from the parser’s point of view, though it does seem reasonable from a grammatical perspective (see Davis and Alphonce (1992), Alphonce and Davis (Forthcoming)). The parser can just as well build the structure with a specifier on the left, but only do so at the point in processing represented by the placement of E in the diagram.
A B
E C
D
Figure 2: Linearity in chain construction Principle (Koopman 1982), the Parallelism Constraint on Operator Binding (Safir 1984), and The Leftness Condition (Chomsky 1976). The Bijection Principle requires that there be a bijective mapping of operators to variables. The allowable readings in (3b) and (3d) arise from A-binding of the pronoun by the variable (the A-trace). A-binding is not available in (3a) and (3c) because the variable does not c-command the pronoun in these cases. The Bijection Principle rules out both (3a) and (3c) because the operator cannot bind both the pronoun and the variable. The Parallelism Constraint on Operator Binding (PCOB) does not constrain the number of variables bound by an operator, but instead requires that all variables bound by a given operator be of the same pronominal type [pronominal]. Examples (3b) and (3d) are ruled in for the reason given above: the pronoun is taken to be A-bound by the variable. The PCOB rules out (3a) and (3c) because the two bindees do not agree in their pronominal specifications: the variable is [?pronominal] while the pronoun is [+pronominal]. The Leftness Condition simply stipulates that a pronoun may not be coindexed with a variable to its right. Examples (3b) and (3d) are ruled in as above, while (3a) and (3c) are ruled out because the pronoun occurs to the left of the variable.
Let us now turn to Palauan. In this language wh-phrases may either be moved overtly or remain in-situ; the same freedom of choice exists with quantifiers. Georgopoulos assumes that when moved both wh-phrases and quantifiers occupy A positions when moved.7 Consider the following Palauan data (Georgopoulos 1991:198–203): (4)
a.
b.
c.
d.
e.
ei [ a retonari er ngiii ] ng-te’ai a lilsa who IR-3-saw-3s neighbors P her ‘Whoi did heri neighbors see ti ?’ ?? temilsa a te’ai a retonari er ngiii 3p-saw-3s who? neighbors P her ‘Whoi did heri neighbors see?’ ]i a retonari er tiri * temengull er a [ rebek el ’ad 3p-respect P every person neighbors P their ‘Theiri neighbors respect everyonei ’ a [ rebek el ’ad ]i [IP a longull er tiri a retonari er tiri ] every person 3-respect P them neighbors P their ‘Theiri neighbors respect everyonei ’ a [ rebek el ’ad ]i a mengull er [ a retonari er tiri ] ti R-respect P neighbors P 3p every person ‘Everyonei respects theiri neighbors’
I will concentrate on the first two examples here, as (4c–e) make the same point with respect to quantifiers which (4a–b) make with respect to wh-phrases. The Leftness Condition correctly rules (4a) in, while both the Bijection Principle and the PCOB rule it out. On the other hand, the Leftness Condition allows the ungrammatical (4b), which both the Bijection Principle and the PCOB correctly exclude. Georgopoulos assumes that in the grammatical cases the pronoun is A-bound by the operator. She constrains this operator-variable binding relation by proposing that a pronoun can only be interpreted as a variable if it is both c-commanded and preceded by its operator. While a c-command component to binding (whether it be A- or A-binding) is expected from a grammatical perspective, a linearity constraint seems stipulatory. Such a constraint is naturally cast as a parsing requirement, since parsing by its very nature is a linear process. Such a reinterpretation is what I have attempted in the chain-building algorithm given above. In the next section, I demonstrate how it works. 7
Michael Rochemont (p.c.) notes that Georgopoulos data can be reanalysed as involving local scrambling of the overtly moved wh-phrases and quantifiers into an A-position. If such an analysis is feasible, then it steals some of the explanatory wind from my algorithmic sails. I discuss this in more detail below. For the moment, I assume that Georgopoulos’ analysis is correct. Rochemont’s suggestion will be addressed in section 8.
CP NP whoi
C C
IP NP ti
I I tense/agreement
VP NP ti
V V respect
NP heri
neighbors
Figure 3: A grammatical example 7. The algorithm at work Let us first consider the grammatical example from (3b). The structure built by the algorithm is shown in Figure 3; the main steps in deriving this structure are: 1. S TART B UILDING C HAIN Who is identified as a wh-phrase. In order to be properly licensed it must (i) appear in specifier of CP position to be interpreted as an operator, (ii) be assigned Case, and (iii) be assigned a role. Because who is an operator, its chain is assigned a new index. 2. A CP is projected in order to satisfy the first licensing requirement of the wh-phrase. 3. Attach who into specifier of CP, satisfying the first licensing requirement. Push trace of who with the remaining two licensing requirements (Case and marking) onto the trace stack.
4. An IP is projected from [I tense/agr] 5. Attach the trace of who into the specifier of IP so that it can receive Case-marking, thereby satisfying the second licensing requirement. Push trace of who with the remaining licensing requirement (-marking) onto the trace stack. 6. Attach IP as complement of C. 7. A VP is projected from [V respect] 8. Attach the trace of who into the specifier of VP so that it can receive -marking, thereby satisfying the last licensing requirement. 9. S TOP B UILDING C HAIN Since all licensing requirements have now been met, stop chain-building activities. 10. Attach VP as complement of I 11. The pronoun her is c-commanded by both the head of the chain who and the foot of the chain ti , and so is not considered to be a candidate for chain membership. Hence, her can be indexed freely, even with the index of the newly-minted chain (as in Figure 3). 12. Attach her neighbors as object of the verb. Consider now the example of weak crossover shown in (3a). Figure 4 shows the structure built by the algorithm. Presently I discuss why the algorithm cannot derive the (impossible) interpretation represented in (3a). Processing proceeds as in the previous example until the pronoun her is encountered. Her is considered to be a candidate for chain membership since it is c-commanded by the head of the chain who; since the foot of the chain has yet to be identified, it fails to c-command her. Her cannot be incorporated into the chain, however, because it is incompatible with the licensing features of the chain. The pronoun her receives genitive Case marking; genitive Case marking of who would surface as whose.8 Therefore, her is contraindexed with the chain. Processing proceeds from this point on as in the previous example, resulting in the structure in Figure 4; the impossible coindexed structure is simply not recovered by the parser. 8
There are additional reasons for excluding her from the chain, but this is sufficient for present purposes. Note, however, that while a pronoun is bad in a “surface” position (as in “Whati did [pictures of iti ] actually resemble?”), acceptability improves when the pronoun is embedded more deeply (as in “Whati did [the pictures that Mary painted of iti ] actually resemble?”). Though this “improvement with embedding” effect appears amenable to analysis as a processing phenomenon, I have no insights at this point.
CP NP whoi
C C
IP
do
NPk herj
I
neighbors
I
tense/agreement
VP NP tk
V V
NP
respect
ti
Figure 4: A case of weak crossover in English 7.7. Palauan Before we consider examples from Palauan, recall that I am assuming that chains are always built from left to right during processing. This entails that an LF chain will be built from its foot to its head, with the head moving to the right. Thus, the parser is engaged in chain-building while it is processing elements to the right of the chain’s foot site. Figure 5 shows the transition from an early stage of processing to the final structure. The structure on the left represents what has been built at the point that who has been attached into the structure. Since not all of the licensing requirements of who are met in this position (who is an operator and must occupy a specifier of CP position in the final structure), chain-building begins. The chain is built from foot to head, and from left to right. Thus, the chain is being built when the pronoun her is encountered. Since the tail of the chain fails to c-command the position of the pronoun, her is considered to be a candidate
V
CP
V
NP
respect
whoi
=)
IP
neighbors her
VP V
NPk NP
V
NP
respect
ti
neighbors
tk
Figure 5: In-situ wh-phrase induces weak crossover (showing foot-to-head and left-to-right chain construction) for chain membership.9 However, just as in the English case, the pronoun is not compatible with the chain, and so must be contraindexed with it. This is shown in the structure on the right of Figure 5. Figure 6 shows the structure recovered for the example in (4a). Since the wh-phrase has moved overtly the chain-building process is completed before the relevant pronoun is encountered, and so it is not considered a candidate for chain membership.
8. Scrambling As mentioned in footnote 7, it is possible that the Palauan cases in which an operator has moved overtly can be analysed as involving local scrambling to an A position. If this is the case, then the Bijection Principle and the PCOB make the right predictions, since the pronoun is then A-bound from the scrambled position, and not A-bound by the operator. The chain-building algorithm described here also makes the correct predictions under this analysis, since the landing site of scrambling c-commands the position of the pronoun. Hence, the foot of the A-chain whose head is in the specifier of CP position and whose foot is in the A position (the landing site for the scrambling) does c-command the pronoun, and is not considered to be a candidate for chain membership. 9
The head of the chain has not yet been established, so we cannot answer whether or not the head of the chain c-commands her; this condition is thus inapplicable at this point of processing.
herj
CP NP
IP
whoi
VP V
NPk NP
V
NP
respect
ti
neighbors
heri
tk
Figure 6: Overtly moved wh-phrase avoids weak crossover If A-scrambling is available in Palauan, then there must be some mechanism for ensuring that it is not available as an LF movement, else the Bijection Principle and PCOB will make the wrong predictions regarding the in-situ case (they will rule the in-situ case in, since an option is to again scramble the operator to an A position before forming the Achain). The chain-building algorithm will rule the in-situ cases out regardless, since such a scrambling chain will also be formed from left to right. Thus, the pronoun will again be considered a candidate for chain membership, yet it is not compatible with the chain, and will therefore be contraindexed with the chain.
9. Functional determination Below I propose an extension of my chain-building algorithm which makes crucial use of a functional algorithm for determining the properties of empty elements. Functional determination of empty categories in the grammar was proposed and abandoned (compare, for example Chomsky (1982) and Chomsky (1986)), but from a processing perspective a deterministic algorithm for deciding the properties of empty categories is attractive. Serial processing favours such functional determination because the alternative approaches lead to backtracking. One view of empty categories is that there are distinct empty categories which are
intrinsically specified with feature content in the lexicon. From the perspective of grammar this means that once an empty category is inserted in the course of a derivation, its feature specification will be fixed. From the perspective of the parser this means that empty categories are treated as homophonous. A serial parser will then have to “guess and revise” to find the correct empty category to fill a particular position. Another view is that there is basically one empty category, and that features are freely assigned to it to account for its differing behaviour in different environments. This leads to the same problem as with intrinsic feature specification from the parser’s point of view: what set of features should it associate with the empty category? Functional determination assumes that there is only one empty category, but the features associated with the empty category in a particular structural context are determined deterministically by an algorithm: there is no “guesswork” involved. The standard notion of functional determination concerns the specification of the features [pronominal] and [anaphoric]. I propose instead a functional determination of empty categories based on their licensing requirements, from which their pronominal/anaphoric status can be derived. The basic idea is the same, however: there is a deterministic algorithm which we can follow to determine the features of a given empty element. The next section outlines the basis for such an algorithm, a licensing lattice.10
10. Licensing of empty elements Government-Binding theory provides an inventory of phonetically empty elements based on the features [pronominal] and [anaphoric], yielding a four-way typology of empty categories. This typology seems too restrictive, however, in that elements with a given feature specification may not behave similarly cross-linguistically. Consider the null pronominal pro. In a language such as Italian pro is viewed as a null counterpart to overt pronouns, and it is taken to be licensed by “rich” agreement features. When licensed by rich agreement pro does not behave at all like an operator, yet it has been proposed that an empty operator is simply a pro which is licensed in an operator position. Browning (1987), for instance, suggests that empty operators in English are null pronominals; given the standard typology of empty categories based on the feature set f[anaphoric], [pronominal]g, the only viable option is to equate empty operators with pro. Notice that the properties of pro are in effect partially functionally determined; if it is licensed by agreement, it acts as a null “A partially ordered set is a set S together with a relation on S such that” is reflexive, antisymmetric and transitive. “A lattice is a partially ordered set in which each pair of elements has a least upper bound and a greatest lower bound.” (from Durbin (1985:294–298)). 10
N0 N0 wh-
Ox Q [
]
indefinitex
Figure 7: Internal structure of interrogative wh-words counterpart to pronouns, but if it is licensed in an operator position, then it behaves like an operator. The standard typology is not fine-grained enough to allow purely intrinsic feature specification. The suggestion that there is one null pronominal which behaves differently in different environments is at odds with the behaviour of overt pronominals. Overt pronominals do not act as operators (though see Demirdache (1991) for such a proposal). An account of this difference is offered by Tsai (1994), who attributes this difference in behaviour to a difference in the internal structure of wh-words and pronominals. In Tsai’s analysis both pronominals and wh-words have at their core an indefinite morpheme. This indefinite morpheme introduces a variable which must be bound. In the case of pronominals, a definite morpheme binds the variable. In the case of wh-words a wh morpheme is present but does not bind the variable. Tsai proposes that in English this variable is bound by a word-internal operator, such as a Q(uestion)-binder (Ox Q ), while in Chinese the variable is bound by a word-external operator. Figure 7 shows the structure Tsai proposes for English interrogative wh-words. I suggest a typology of empty elements based on their licensing requirements rather than on their pronominal/anaphoric properties. In Figure 8, I have arranged the set of empty elements which do not require an antecedent11 into a lattice structure. The lattice is constructed from the set of all subsets of features drawn from the set f, Case, Scopeg, where “Scope” is meant to represent whatever feature forces the element (an operator of some sort) appear in a scope/operator position (specifier of CP) at LF. These features represent licensing requirements. The idea I want to capture is that the default element which the parser chooses from the lattice is the smallest element which is consistent with the licensing properties of the [
]
This set can be characterized in a number of ways. For instance, it is fx—x is phonetically empty and can head its own chaing=fx—x is phonetically emtpy and is not a traceg=fx—x is phonetically empty and is not subject to the ECPg. 11
pro (as in English: an operator) [+th,+ca,+sc]
pro (as in Italian) [+th,+ca]
PRO [+th]
PRO (as an operator) [+th,+sc]
empty expletive [+ca]
empty expletive operator [+ca,+sc]
unselective binder, Q-binder or [+wh] [+sc]
(i.e. nothing) [] (no features)
Figure 8: Lattice of empty elements empty position. Consider the case of an optionally transitive verb, when no filler has been identified (recall that this lattice is for empty elements which do not require an antecedent). The parser will choose ; since there are no licensing requirements to fulfill. The parser thus inserts nothing, and treats the verb as intransitive. Suppose, however, the position being considered has both a role and abstract Case assigned to it. In this case the minimal element consistent with these features is chosen from the lattice: proItalian . This is almost completely uninteresting. However, there is a wrinkle: any given language may not have all the positions in the lattice available. Thus, while Italian has a null pronominal which obeys both the criterion and the Case filter, English does not. English does, however, have empty operators. In the typology I have set up here, this English “pro”, let’s call it proEnglish , is the top of the lattice. Since English does not have a counterpart to Italian pro, whenever an empty position which is - and Case-marked is encountered, the only option is to postulate an empty operator (the minimal element from the English lattice which can bear both and Case marking).12 Since this element also bears a “scope” licensing requirement, it must 12
It would be more in line with current thinking to say that all the elements from the lattice are universally (i.e. cross-linguistically) available, but that some languages do not license all of the elements of the lattice. One
occupy a “scope” position at LF. Thus, English does not support the null pronominal interpretation which Italian does; in these cases an operator-variable chain is constructed, which must be given an interpretation through some other mechanism, such as predication. 13 Thus, while proItalian and proEnglish may both be specified as [+pronominal,-anaphoric], they are distinguished by their licensing requirements (which presumably reflects a difference in internal structure) in the given lattice. To summarize, the idea is that there is a natural ordering which indicates the preferred selection of an empty element. Not all languages have all elements. There can be gaps in the lattice structure for a given language (witness the difference between an Italiantype pro and an English-type pro). The parser will choose the smallest element of the lattice which fulfills all the requirements of the encountered position. It is not clear that every element in the lattice is actually realized in at least one language; I leave exploration of this issue for future research. I would like to suggest, though, that the element whose only licensing feature is [+sc] in the lattice might be identified with the operator which in Tsai’s analysis unselectively binds the variable introduced by the indefinite.
11. Revisiting weak crossover In the earlier description of the chain-building algorithm I did not discuss how nontrace empty elements are identified. For non-optional positions which are not lexically filled, the parser simply chooses the smallest compatible element from the lattice, as discussed in the previous section. In this section I consider the conditions under which the [+sc] element is proposed. Hebrew relative clauses present an interesting puzzle. Consider the following Hebrew data (from Demirdache (1991:51–56)): (5)
a.
b.
?ohevet ti * ha-?iˇsi sˇe ?im-oi the-man that mother-hisi loves ‘the man that his mother loves’ ha-?iˇsi sˇe ?im-oi ?ohevet ?otoi the-man that mother-hisi loves himi
reason I have not chosen to make this translation here is that this would require the incorporation of additional licensing possibilities into the lattice, such as “agreement” for proItalian ). I leave exploration of this option for future work, noting that for the purposes of this paper the lattice structure presented in the text is sufficient to outline the functional determination mechanism I propose. 13 This, I have argued elsewhere (Alphonce 1993), is what happens in infinitival relatives in English.
c.
‘the man that his mother loves him’ ?ohevet ti * ha-?iˇsi (ˇse) ?otoi xana ?amra sˇe ?im-oi the-man (that) himi Xana said that mother-hisi loves
Example (5a) is ruled out as expected. The weak crossover configuration cannot be recovered by the parsing algorithm. If this case is considered parallel to the English case, then it will be ruled out for the same reasons. If instead an empty operator is postulated at the object position of the verb, it must be moved (to the right) to the specifier of CP position. Since the operator comes after the pronoun, it will be contraindexed with the pronoun (recall that when an operator-variable chain is established, it is given a new index, one that has not yet occurred in the structure). Example (5b), which employs a resumptive pronoun strategy, is unexpectedly ruled in. There parser does not encounter an empty position within the relative clause, so no empty operator is inserted. However, the relative clause must be licensed through predication, notwithstanding the fact that there is no obvious operator-variable pair within the relative clause to support its interpretation as a predicate. I suggest that the minimal operator-like element from the lattice (the [+sc] unselective binder) is inserted as a last resort measure to create a predicate. This binder can unselectively bind the two pronouns 14 to create the required operator-variable structure. 15 In example (5c) the movement of one pronoun results in deviance. This follows straightforwardly from the chain-building algorithm. Between the moved pronoun and its extraction site the parser is engaged in building a chain. The possessive pronoun is a candidate for chain membership, but is not compatible with the chain, and so is contraindexed with the chain. Note that this explanation does not depend on the the head of the chain being an operator, as do the Bijection Principle and PCOB approaches to weak crossover. What is important is whether or not the parser is engaged in building a chain.
12. Summary In this paper I presented a chain-building algorithm and showed how it can offer an account of the weak crossover phenomenon, considering data from English and Palauan. 14
The binder will all pronouns it is co-indexed with. Note that internally headed relative clauses (IHRCs; see Williamson (1987) and Culy (1990) for a discussion) may fall under the same umbrella. In IHRCs the internal head is subject to an indefiniteness restriction. This brings to mind the question words in languages such as Chinese, which are in fact indefinite elements which derive their interpretation from a c-commanding operator; different operators yield different interpretations for the indefinite. I will not pursue this further here.
15
I also proposed an extension to my chain-building model which might account for weak crossover facts in Hebrew. The extension is based on functional determination of empty elements on the basis of their licensing requirements rather than their pronominal/anaphoric status. The empirical adequacy of the extended model remains to be established.
References [1] Alphonce, Carl. (1993). “Recovering a logical form representation using a single-pass principle-based parser.” In Paul McFetridge and Fred Popowich, editors, Proceedings of the First Conference of the Pacific Association for Computational Linguistics. The Pacific Association for Computational Linguistics. [2] Alphonce, Carl and Henry Davis. (Forthcoming). “Motivating non-directional movement.” In Henk van Riemsdijk, David LeBlanc, and Dorothee Beermann, editors, Rightward Movement, Amsterdam. John Benjamins. [3] Berwick, Robert C. and Amy S. Weinberg. (1984). The Grammatical Basis of Linguistic Performance: Language Use and Acquisition. Current Studies in Linguistics. The MIT Press. [4] Bever, Thomas G. and Brian McElree. (1988). “Empty categories access their antecedents during comprehension.” Linguistic Inquiry, 19:35–43. [5] Browning, Margaret. (1987). Null Operator Constructions. PhD thesis, MIT. [6] Chomsky, Noam. (1976). “Conditions on rules of grammar.” Linguistic Analysis, 2(3). [7] Chomsky, Noam. (1982). Some concepts and consequences of the theory of government and binding. Linguistic Inquiry Monographs 6. The MIT Press. [8] Crain, S. and J. D. Fodor. (1985). “How can grammars help parsers?” In D. Dowty, L. Kartuunen, and A. Zwicky, editors, Natural Language Parsing: Psychological, Computational and Theoretical Perspectives. Cambridge University Press, Cambridge. [9] Culy, Christopher. (1990). The syntax and semantics of internally headed relative clauses. PhD thesis, Stanford University. [10] Davis, Henry and Carl Alphonce. (1992). “Parsing, WH-movement and linear asymmetry.” In Kimberley Broderick, editor, Proceedings of the North East Linguistic Society 22, Amherst. GLSA, University of Massachusetts. [11] Demirdache, Hamida Khadiga. (1991). Resumptive chains in restrictive relatives, appositives and dislocation structures. PhD thesis, MIT. [12] Frank, Robert. (1992). Syntactic locality and Tree Adjoining Grammar: Grammatical, Acquisitiion and Processing perspectives. PhD thesis, University of Pennsylvania. [13] Frazier, Lyn and Giovanni B. Flores D’Arcais. (1989). “Filler driven parsing: A study
[14] [15]
[16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27]
of gap filling in Dutch.” Journal of Memory and Language, 28(3):331–344. Garey, Michael R. and David S. Johnson. (1979). Computers and intractability: A guide to the theory of NP completeness. San Francisco: W. H. Freeman. Georgopolous, C. (1991). Syntactic variables: Resumptive pronouns and A binding in Palauan, volume 24 of Studies in Natural Language and Linguistic Theory. Kluwer Academic Publishers. Gorrell, Paul. (1995). Syntax and parsing. Cambridge University Press. Koopman, Hilda and Dominique Sportiche. (1982). “Variables and the bijection principle.” The Linguistic Review, 2:139–160. MacDonald, Maryellen C. (1989). “Priming effects from gaps to antecedents.” Language and Cognitive Processes, 4(1). Marcus, Mitchell P. (1980). A Theory of Syntactic Recognition for Natural Language. The MIT Press. Mazuka, Reiko. (1991). “Processing of empty categories in Japanese.” Journal of Psycholinguistic Research, 20(3):215–232. Nakayama, Mineharu. (1990). Accessibility to the antecedents in japanese sentence comprehension. Ms., The Ohio State University Pritchett, Bradley. (1992). Grammatical Competence and Parsing Performance. The University of Chicago Press. Ristad, Sven Eric. (1990). Computational Structure of Human Language. PhD thesis, MIT. Safir, Ken. (1984). “Multiple variable binding.” Linguistic Inquiry, 15(4):603–638. Stowe, Laurie A. (1986). “Parsing WH-constructions: evidence for on-line gap location.” Language and Cognitive Processes, 1(3):227–245. Tsai, Wei-Tien. (1994). On Economizing the theory of A-bar dependencies. PhD thesis, MIT. Williamson, Janis S. (1987). “An indefiniteness restriction for relative clauses in Lakhota.” In E. Reuland and A. ter Meulen, editors, The Representation of (In)definiteness. The MIT Press, Cambridge, Massachusetts.