Processing local and unbounded dependencies: A unified account ...

J o u r n a l o f P s y c h o l i n g u i s t i c R e s e a r c h , Vol. 23, No. 4, 1 9 9 4

Processing Local and Unbounded Dependencies: A Unified Account M a r t i n J. P i c k e r i n g ~ This p a p e r p r o p o s e s an a c c o u n t o f i n c r e m e n t a l s e n t e n c e p r o c e s s i n g a n d the initial s t a g e o f s y n t a c t i c a m b i g u i t y resolution, b a s e d on the claim that the p r o c e s s o r s e e k s to p r o v i d e s e m a n t i c interpretations f o r s e n t e n c e f r a g m e n t s as soon as it p o s s i b l y can. In this model, there is no f u n d a m e n t a l distinction b e t w e e n local a n d u n b o u n d e d dependencies. The p r o c e s s o r e m p l o y s a version o f c a t e g o r i a l g r a m m a r b a s e d on d e p e n d e n c y g r a m m a r , in which d e p e n d e n c y constituents a r e d e r i v e d f r o m d e p e n d e n c i e s b e t w e e n w o r d s a n d are p e r m i t t e d to overlap. The p r o c e s s o r s e e k s to f o r m d e p e n d e n c y constituents as soon as it can, a n d to give interpretations to these f r a g m e n t s immediately. The initial s t a g e o f a m b i g u i t y resolution is d e t e r m i n e d by the principle of dependency formation, u n d e r which the p r o c e s s o r a u t o m a t i c a l l y c h o o s e s an analysis that a l l o w s a single d e p e n d e n c y cortstituent to be f o r m e d in p r e f e r e n c e to one that d o e s not. The motivation is s e m a n t i c : S,lch an a n a l y s i s ma-rimizes the a m o u n t o f i n c r e m e n t a l interpretation that is possible. B u t i f m o r e thap o n e analysis is c o m p a t i b l e wit/, the f o r m a t i o n o f a single constituent, the p r o c e s s o r can a p p e a l to a r a n g e o f s o u r c e s o f n o n s y n t a c t i c inforrnatiop in m a k i n g its choice. I s h o w h o w this a c c o u n t can c a p t u r e a r a n g e o f p s y c h o l i n g u i s t i c e v i d e n c e w i t h o u t p o s i t i n g at O, fi~ndarnental distinction b e t w e e n local a n d u n b o u n d e d dependencies.

INTRODUCTION This paper assumes that the human sentence processor attempts to interpret what is heard or read as quickly as it can. In other words, interpretation is incremental. This is compatible with the intuition that there is no lag between I would like to thank Holly Branigan, Ted Gibson, Simon Livcrsedge, Carson Schtitze, Virginia Teller, and Matt Traxler for comments on the manuscript, and members of the Sentence Processing Group of the Human Communication Research Centre, Universities of Edinburgh and Glasgow, for discussion. I acknowledge the support of ESRC grant no. R000234542 and a British Academy Postdoctoral Fellowship. Address all correspondence to Human Communication Research Centre, Department of Psychology, University of Glasgow, 56 Hillhead Street, Glasgow, G12 9YR, United Kingdom.

323

0090-6905/94/0700-0323507.00/0 9 1994 Plenum Publishing Corporation

324

Picketing

hearing or reading and the beginning of understanding. As an utterance is encountered, the representation of its meaning is continually updated. Therefore, there is no need for the processor to wait until the end of a utterance before producing an interpretation. Psycholinguists have discussed the notion of incremental interpretation since Marslen-Wilson (1973, 1975), who demonstrated that the process of language understanding can be affected by the meaning of what is being processed within a few hundred milliseconds. In other words, some aspects of semantic interpretation occur very quickly. More recently, Altmann and Steedman (1988) demonstrated that the processor can rapidly compute the possible referents of noun phrases during comprehension, and that this affects subsequent aspects of parsing. In addition, there are numerous demonstrations that the plausibility of a sentence fragment has rapid effects on the process of reading (e.g., Clifton 1993; Ferreira & Clifton, 1986; Holmes, Kennedy & Murray, 1987; Rayner, Carlson, & Frazier, 1983; Stowe, 1989; Trueswell, Tanenhaus, & Garnsey, 1994). Despite this work, the study of incremental processing has for the most part concentrated on the issue of the way in which syntactic analysis is performed on-line. Work since Bever (1970) has assumed that syntactic analyses are adopted for sentence fragments. The fact that locally ambiguous sentences such as The horse raced past the barn fell are harder to process than sentences with no such ambiguity but which are otherwise similar (e.g., with ridden replacing raced) demonstrates that misanalysis must have occurred and strongly suggests that syntactic processing occurs incrementally. Of course this particular analysis may not always be chosen, and most investigations of semantic factors in sentence processing have been designed to determine whether semantics can affect choice of syntactic analysis. There has, in general, been less interest in the issue of what aspects of semantic processing are actually performed in an incremental manner. Probably the main exception is the work on referential context effects (Altmann & Steedman, 1988; Crain & Steedman, 1985) but even this has concentrated on one aspect of interpretation alone and has focused on the issue of whether it can affect syntactic analysis. There has, in contrast, been little discussion of whether the processor is able to make inferences, using the interpretation of the fragment as a premise, as a sentence is encountered. This paper does not consider such issues of incremental interpretation, but it does assume that the processor computes syntactic analyses incrementally in order that it may also compute a semantic representation incrementally. This representation makes it possible for deeper aspects of interpretation to take place in principle. In particular, I claim that the processor performs syntactic analysis and resolves syntactic ambiguity in a way that is likely to maximize incremental interpretation. I present a theory con-

Processing Local and Unbounded Dependencies

325

sistent with this assumption below (for more detailed discussion of how incremental interpretation could work, see Pulman, 1986; Sheiber & Johnson, 1993; Stabler, 1991; Steedman, 1989, 1992). I assume that certain sentence fragments admit a greater degree of incremental interpretation than others. For example, a fragment consisting of two noun phrases, such as the boy a book, cannot be interpreted as a whole. This contrasts with a complete sentence, but also with sentence fragments such as loves Mary (which is a traditional constituent) and gives the boy or the man loves (which are not). The processor always prefers to compute fragments which can be interpreted as a whole, primarily because this maximizes incremental interpretation. Clearly, we now need an account of which sentence fragments can be interpreted as a whole. Such an account is provided by the framework of dependency categorial grammar (Pickering & Barry, 1993), and, in particular, in its approach to constituency. Dependency categorial grammar is based on the framework of dependency grammar, where the fundamental syntactic relations, known as dependencies, hold between pairs of words. We then define a constituent as a sequence of words which are all connected by dependencies. This approach to constituency is unusual, and in particular is alien to standard phrase structure grammars. However, it is found within flexible categorial grammars (e.g., Moortgat, 1988; Steedman, 1987), and is compatible with dependency grammars (e.g., Hudson, 1990). Categorial grammar assumes a very close relationship between syntax and semantics, with each syntactic combination of constituents of the grammar being paired with a semantic combination between the interpretations of those constituents. This means that all (and only) constituents of the grammar automatically receive semantic interpretations. The language processor employs the theory of grammar in a particularly direct manner: All and only constituents of the grammar may be constructed by the processor and given unified interpretations (where the meanings of each word contribute to the meaning of the whole constituent). The processor interprets what it can as soon as it can, so the process of constituent construction is as immediate as possible. The processor can now immediately integrate the interpretation of a constituent with general knowledge and use this interpretation to affect other cognitive processes and behavior (see Chater, Pickering, & Milward, 1994, for an account of how this could OCCUr).

AMBIGUITY RESOLUTION Because the human sentence processor seeks to form constituents and construct interpretations whenever possible, it initially chooses an analysis

326

Picketing

which forms a constituent in preference to one which does not. This is the only syntactic principle employed in ambiguity resolution. If, however, there is more than one way to form a constituent, then the processor can employ nonsyntactic information in order to choose between the alternatives. In such cases, it will make use of all relevant information with the goal of ensuring that the analysis it chooses is most likely to be the correct one. It follows that some ambiguities are resolved in a modular manncr, making reference solely to syntactic information, whereas some are not. The way in which an ambiguity is resolved depends on the construction type involved. Two examples will illustrate the fundamental dichotomy: 1. Though George kept on reading the story still bothered him. 2. The journalist interviewed the daughter of the colonel who had had an accident. In example 1, the story could be tile object of reading or the subject of a new clause. In this theory, the fragment though George kept on reading is a constituent. If the sto~y is treated as the object of reading, then the fragment though George kept on reading the stoly is also a single constituent. But if the sto~y is treated as the subject of the main clause, then the fragment ceases to be a constituent, and cannot be interpreted as a whole. Informally, the stopy has not been attached into the subordinate clause. Because the fragment can be interpreted as a whole under the first analysis but not under the second, the processor initially adopts the first analysis. When the rest of the sentence is encountered, it becomes clear that the chosen analysis was wrong, and a garden path effect ensues. Example 2 is processed very differently. The ambiguity concerns whether it is the daughter or the colonel who has had the accident. In many theories (e.g., Frazier, 1979), the processor initially attaches the relative clause " l o w , " and therefore assumes that the colonel has had the accident. But within the present theory, no automatic syntactic principle applies. The reason is that the processor has formed a constituent under either analysis, and hence constructs two competing interpretations. In such cases, the processor can appeal to nonsyntactic sources of information to decide on an analysis. Precisely how this happens is not addressed by this paper, though the assumption is that many different sources of information can be employed in parallel as weak constraints. The distinction between two kinds of ambiguity is also found in unbounded dependencies such as example 3 below: 3. I known whom you believe John supports.

Processing Local and U n b o u n d e d D e p e n d c n c i e s

327

The model predicts that whom is initially treated as the object of believe. In fact, whom turns out to be the object of supports, so the initial analysis has to be retracted. Considerable experimental evidence supports this account, as discussed below, and a similar assumption is made in many current theories [e.g., Clifton & Frazier's (1989) active filler strategy]. In the current model, there is no need for a separate principle to deal with unbounded dependencies, and there is no role for empty categories or gaps (Pickering, 1993; Picketing & Barry, 1991). In general, I propose that some ambiguities are resolved by reference to syntactic information alone, but other ambiguities are resolved using nonsyntactic information. Which approach is used is dependent on the particular ambiguity involved, and is determined by the analyses that can be assigned to the fragment. This is discussed in detail below. This model stands in contrast to two other classes of account, which can only be mentioned briefly here. One class of account holds that the processor makes all initial decisions on the basis of syntactic strategies alone. The best-known proposal is due originally to Frazier (1979), and is sometimes called the "garden path model." It employs the strategies of minimal attachment and late closure, together with the active filler strategy, and also assumes that subcategorization information is ignored. There are also other syntactically-driven accounts (e.g., Abney, 1989; Gorrell, 94; KimbalI, 1973; Pritchett, 1992) which assume different syntactic strategies for the initial process of ambiguity resolution. Alternatively, the processor may employ nonsyntactic information in all instances of ambiguity resolution. Such positions have been proposed recently by Gibson (1991), Spivey-Knowlton, Trueswell, and Tanenhaus (1993), MacDonald, Pearlmutter, and Seidenberg (1993), and MacDonald (1994) as "constraint-based" models, and incorporate many of the assumptions of models like Tyler and Marslen-Wilson (1977), Ford, Bresnan, and Kaplan (1982), Crain and Steedman (1985), and Taraban and McClelland (1988). In the extreme case, the processor will employ any available information in order that it can choose the analysis that is most likely to be correct. It is important to stress that this paper is only concerned about providing an account of the initial process of ambiguity resolution. Indeed, its main cIaim is that certain ambiguities are resolved on the basis of syntactic strategies alone, and it seeks to characterize these ambiguities and the strategies employed. It will only provide incomplete suggestions about how nonsyntactic processes of ambiguity resolution work. In addition, it does not address any issues concerning what happens after the initial decision is

328

Picketing

made. Hence, it does not seek to determine what precipitates reanalysis, or why certain kinds of reanalysis seem to be harder than others [as discussed by Pritchett (1988, 1992) in particular].

DEPENDENCY, CONSTITUENCY, AND DEPENDENCY CATEGORIAL GRAMMAR This section outlines the grammatical theory of dependency categorial grammar (dependency CG) that underlies the account of incremental processing [see Pickering & Barry (1993) for a more complete description; Barry & Picketing (1990), Barry (1992), and Pickering (1993) provide additional information]. First, I describe the notion of dependencies between words employed in this theory, and then derive the notion of a dependency constituent. Finally, I sketch dependency CG, and show how it is able to generate dependency constituents and assign them appropriate syntactic categories and semantic interpretations. It is important to stress that dependency constituents are the fundamental notion within this framework, not dependencies themselves, and the discussion of dependencies below serves mainly to motivate this notion.

Dependency and Dependency Grammar In dependency grammar, the primitive syntactic links hold between words rather than phrases. These links are called dependencies, and connect the ruler (otherwise known as the head or controller) with the dependent (or controlled): For each dependency, an arrow is drawn leading from the ruler to the dependent. For example, John spoke to Mary can be given the analysis below:

4. John spoke to Mary. This carries the information that spoke has the two dependents, John and to, and that to has the dependent Mary. This representation (including the word order information) is called a dependency diagram. This approach to grammatical analysis differs greatly from the phrase structure approach, in that the notion of constituency is not primitive. However, theories of constituency can be derived from dependency diagrams, as discussed below. The origins of phrase structure grammar can be traced back to Bloomfield's (1933) assumption of immediate constituency, whereby the analysis of a sentence involves recursively dividing constituents into two or

Processing I ~ c a l and U n b o u n d e d Dependencies

329

more smaller constituents, until individual words are reached. Dependency grammar (DG) has at least as long a history, and is found for instance in the tradition of analysis in terms of government and agreement between words. DGs, in tile form considered here, originated with Tesniere (1959), and generative approaches to DG were proposed by Gaifman (1965) and Hays (1964). The amount of work on DGs has been relatively limited, though a major exception is the word grammar framework of Hudson (1984, 1990). An important introduction to syntactic analysis in terms of dependencies is found in Matthews (1981, pp. 78ff.). In this paper we consider the form of D G known as classical DG (as described by Gaifman, 1965). In DG, the analysis for a sentence is a directed graph with the words at the nodes and the dependencies as edges. Classical DG is defined by the following properties. First, it must be possible to trace a path between any two words in the sentence, ignoring the directions on the arrows (this is known as connecte&zess). Second, there must be exactly one word (known as the root of the sentence) which has no ruler; in sentence 4, this word is spoke. Finally, every other word must have exactly one ruler (the singleruler requirement). Mathematically, classical DG defines directed trees together with a representation of linear order. In addition, every diagram in classical dependency grammar obeys adjacency, which is the requirement that no word may intervene between the two words in a dependency unless it is subordinate to one of them. We also say that one word X is subordinate to another word Y if X is either a dependent of Y or a dependent of another word which is itself subordinate to Y. This means that John, to, and Mary are all subordinate to spoke. These formal requirements do not of course tell us how to determine the dependency diagram for a given reading of a sentence. There is of course no infallible way of determining this diagram; in this respect, DG fares no better or worse than phrase structure grammar. The following proposals are not necessary to classical DG (though many of them would probably be noncontroversial), and relate specifically to the theory developed by Pickering and Barry (1993) and adopted in this paper. Here we shall provide some brief exposition only. We begin by assuming that the tensed verb in the main clause is the root of the sentence. We then assume that rulers select the (sub)categories of dependent words. In sentence 1, spoke is the root of the sentence, This word selects two elements, the noun John and the preposition to. This contrasts with phrase structure grammar in the important respect that spoke selects the preposition to rather than a prepositional phrase headed by to. However, John spoke to is not a sentence, because to obligatorily selects a noun, which in this case is Mary. Notice that this criterion of category selection is given important support by the phenomena of g o v -

330

Pickering

ernment and agreement, which are relations between words rather than constituents. For example, spoke governs the case of its subject, so that John could be replaced with he but not with him. A few dependency diagrams (taken from Pickering & Barry, 1993) will illustrate the account: 5. F r e d ~ ' ~ i ~ e d . 6. J o h n ~ s u d ~ g h e d . 7. John has been shooting grouse. 8. Mary saw~erself. In example 5, laughed is the root of the subordinate clause Bill laughed (i.e., it has no ruler within this clause), and is a dependent of the main verb thinks. If the sentence included the complementizer that, then laughed would be the dependent of that and that would be a dependent of thinks. In example 6, the modifier suddenly is treated as dependent of the verb laughed. This makes the point that our dependency diagrams do not distinguish between argument and adjunct. This account does not, for instance, treat suddenly as the ruler of laughed. The root in example 7 is the tensed verb has, rather than the " m a i n " verb shooting. The demonstrates that the criteria for assuming dependencies are syntactic rather than semantic. The fact that it is John who has been shooting does not mean that John is a dependent of shooting. Example 8 also shows that not all relations between words are realized as dependencies. The constraints on the form of herself can be captured by a part of a fully specified grammar concerned with binding. This illustrates the point that a dependency diagram can roughly be thought of as an alternative to a phrase structure representation for a sentence. It does not seek to capture all linguistically interesting relationships. Finally, example 9 below demonstrates how some ambiguity is captured by dependency diagrams. It has the two analyses: 9a and 9b: 9a. Mary said Sue left yesterday. 9b. Mary said Sue left yesterday. The first diagram treats (yesterday) as modifying said, and captures the "high attachment" reading where Mary spoke yesterday. The second diagram has yesterday modifying left, and captures the "low attachment" read-


331

ing where Sue left yesterday (according to Mary). Of course it should be clear that many ambiguities are not differentiated by dependency diagrams. As one example, scope ambiguities are not addressed. D e p e n d e n c y Constituency

The notion of a constituent is not primitive to DGs. However, it is possible to derive particular theories of constituency from DGs and dependency diagrams. Pickering and Barry (1993) defined a theory of constituency called dependency constituency. A dependency constituent (DC) is an expression consisting of words which are linked by dependencies. In other words, a dependency constituent must be connected: It must be possible to trace a path between any two words in the constituent via dependencies, ignoring the directions on the arrows. The complete sentence will of course be a DC, since it must be connected by definition. In addition, all individual words are DCs. In example 4 above, the other DCs are the following: 10. {John spoke, spoke to, to Mary, John spoke to, spoke to Mary} We write each DC as the linear order of the words in the sentence. It is straightforward to demonstrate that all of these strings of words are DCs, since they are all connected. In example 5, the DCs are the following: 11. {Fred thinks, Bill laughed, thinks Bill laughed, t h i n k s . . , laughed, Fred thinks . . . laughed} Here, we make use of t h e . . , notation to indicate discontinuity. For example, thinks . . . laughed is a DC, because there is a dependency between thinks and laughed, even though they are not adjacent. In example 4, all substrings were in fact DCs. But this is not the case in general: For instance, in example 5, the substrings thinks Bill and J o h n thinks Bill are not DCs, since they are not connected. This difference will prove fundamental to the discussion of incremental interpretation and to the theory of ambiguity resolution. In this framework, only DCs can be incrementally interpreted, and processing difficulty ensues if a sentence fragment consists of too many unconnected substrings. The initial process of ambiguity resolution involves selecting the analysis which requires the assumption of fewest DCs (without reference to any other factors). But if more than one analysis requires the same number of DCs, then the choice of analysis is made interactively, with reference to nonsyntactic factors.

332

Pickering The most important property of dependency constituency is that it is

flexible, by which we mean that constituents may overlap. For example, John spoke and spoke to overlap in example 4. This is of course not possible in phrase structure grammar, which embodies a rigid theory of constituency (not permitting partial overlap)7 Picketing and Barry (1993) argued that flexible constituency in general, and dependency constituency in particular, have considerable linguistic advantages over rigid theories of constituency such as versions of context-free phrase structure grammar. They concentrated on the analysis of coordination, and suggested that dependency constituency can capture the considerable freedom that exists in deciding what expressions can serve as conjuncts in coordinations. It is important to be aware that the arguments for adopting this approach to grammar are not derived solely from its applicability to incremental interpretation.

Dependency Categorial Grammar Pickering and Barry (1993) proposed a linguistic theory called dependency categorial g r a m m a r (CG), which is based on dependency relations. It provides a way of assigning categories to dependency constituents. The categories of phrase structure g r a m m a r clearly cannot serve because many dependency constituents are not phrase structure constituents. In addition, dependency CG provides semantic interpretations for all and only constituents of the grammar. I shall ignore the formal details of this model as much as possible in this paper, but there is an important reason why this cannot be done completely. Dependency diagrams cannot describe unbounded dependencies in themselves, and are therefore ultimately inadequate. The reason for this is essentially the same reason that context-free phrase structure grammars with simple categories cannot describe unbounded dependencies (see Gaifman, 1965). However, dependency constituents can capture unbounded dependencies, and it is important to stress that dependency constituents are the central concept of this theory of g r a m m a r and processing, not dependencies themselves. Hence we need to define the notion of a dependency constituent. Pickering and Barry (1993) defined the theory of dependency CG, which assigns categories and associated semantic interpretations to all and only dependency constituents. In CG (Ajdukiewicz, 1935), linguistic expressions are classified according to a system of recursively defined categories (alternatively called types), which are built from a set of primitive categories and a set of connectives. The primitive categories are phrasal 2 Note that Gaifman (1965) and Matthcws (1981) derive a rigid theory of constitucncy from DG, where a constituent is an expression containing a word IV and every word subordinate to IV.

Processing Local and U n b o u n d e d Dependencies

333

rather than lexical. We shall make use of the categories S, S', NP, and PP, which can be equated with their namesakes in phrase structure grammar, and N [corresponding to N' in X' syntax (Jackendoff, 1977)]. Dependency CG assumes the two connectives / and \ ; this is the "bidirectional" system of Bar-Hillel (1953). The categories for many linguistic expressions are found by combining the primitive categories with the connectives in the following way. The complex category X/Y is given to an expression that can combine with a following adjacent expression of category Y to give an expression of category X. Likewise, the category YLg is given to an expression that can combine with a preceding adjacent expression of category Y to give an expression of category X? This approach is sometimes called "lexicalist," in that most of the combinatory properties of lexical items can be determined on the basis of the categories, and there is no need for large numbers of rules corresponding to the rules of phrase structure grammars. For instance, J o h n is given category NP and J o h n walks is given category S. The verb walks forms a sentence whenever it is preceded by a noun phrase, and hence it receives the category N P \ S . Likewise, loves forms a sentence when it is preceded by a noun phrase and followed by a noun phrase, so it may receive the category (NP\S)/NP or the category NP\(S/NP). The preposition to receives the category PP/NP, since expressions like to Sue have the category PP, and Sue has category NP. Therefore spoke can receive the category (NP\ S)/PP or N P \ (S/PP). Finally, thinks can receive either (NP\ S)/S or NP\(S/S). This allows F r e d thinks to receive the category S/S, meaning that it is an expression which can combine with an immediately following sentence (such as Bill laughed) to form a sentence. 4 In CG, semantic considerations are usually very important. A fundamental assumption is that all syntactic combinations are paired with semantic combinations. CG adopts Frege's principle of compositionality, whereby the meanings of expressions of primitive categories are treated as basic, and the meanings of other expressions are defined in terms of the their relation to these meanings. The meaning of an expression of category X / Y or YkX is given by a function which maps the meaning of any expression of category Y with which it combines to the meaning of the resultant expression of category X. In this paper, we shall not be concerned with the effects of these semantic combinations. The important point is that the use of categorial

3This is Lambck's (1958) notation for categories. In the alternative notation used by Stecdman (1987), what we ',',,rite as YLX is written as XXY. Notice that the categories discussed here arc the ones relevant to the sentences in question. If walks is used transitively, as in John walks tile dog, then its category is NP\(S/NP) or (NP\S)/NP, but not NP\S. Likewise, if spoke or thinks is used intransitively, then the category is NP\S.

334

Pickering

grammar allow the process of incremental interpretation to be directly linked to the formation of constituents. Dependency CG is a version of CG which assigns appropriate categories to all and only DCs. It contrasts with other versions of CG: (i) Classical CG (Bar-Hillel, 1953; cf. Ajdukiewicz, 1935) assigns categories to a range of strings in a similar way to phrase structure g r a m m a r (e.g., Fred thinks would not be a constituent in example 2); (ii) Lambek CG (Lambek, 1958; Moortgat, 1988) assigns categories to the combination of any two adjacent words, even if these words appear to have no linguistic relationship (e.g., Fred thinks Bill would be a constituent in example 2); (iii) combinatory CG (CCG) (Steedman, 1987) assigns categories to a different range of strings from dependency CG, though there is considerable overlap. Finally, we should note two points about dependency C G which are dependent on the lexical categories employed. First, the treatment of modifiers is based on the standard approach within D G rather than the standard approach within CG. Modifiers are given simple categories, e.g., PoN (postnominal modifier) for covered in leaves, or whom John likes, making them arguments rather than functors. Second, dependency CG deals with unbounded dependencies by employing complex categories for relative pronouns and similar elements, as in other versions of flexible categorial g r a m m a r (e.g. Ades & Steedman, 1982). In the girl whom John likes, whom is given the category PoN/(S/NP), which means that it can be combined with an expression of category S/NP, such as John likes, to give a postnominal modifier, whom John likes, of category PoN. In addition, the girl receives the category NP/PoN, so that it can combine with whom John like to form an expression of category NP. Hence we can provide a unified account of processing local and unbounded dependencies in terms of DCs, even though such an account is not possible in D G or with dependency diagrams.

INCREMENTAL PROCESSING Dependency CG can now be integrated into a model of incremental sentence processing. We must first deal with unambiguous constructions, and then extend the model to ambiguous constructions. The basic assumption is that the processor constructs dependency constituents when it is able to do so and provides interpretations for them. As the processor encounters each word, it attempts to integrate that word into the current DC. If this is not possible, combination is delayed and the processor makes use of a stack. The language processor makes use of what is known a shift-reduce parser, in a manner similar to the proposals of Ades and Steedman (1982).


335

Table I. Examples of the Use of a Stack in Incremental Processing

Word

Category

John spoke to Mary

NP (NP',,S)/PP PP/NP NP

Fred thinks Bill laughed

NP (NINS)/S NP NPkS

Reductions 1 1 1 1 2

Stack NP S/PP S/NP S NP SIS SiS S

NP

Let us consider the processing of examples 4 and 5 above using the system of representation illustrated in Table I. Parsing operations are represented in the following way. Each new word is placed on a separate row. The first two columns contain the word and its category. (A more complete account would incorporate a representation of the semantics.) The final three columns represent the stack, with the bottom on the left, so that the stack after encountering Bill comprises S/S at the bottom and NP on top. As each word is encountered, it is incorporated into the current DC if possible. This process of incorporation is known as reduction. Hence, as each word is encountered in example 4, a single reduction is possible, as indicated in the third column. In example 5, when Bill is encountered, a single DC cannot be formed, so the category for Bill has to be placed on the stack. This process is known as shifting. However, when laughed is encountered, it is possible to combine Fred thinks Bill laughed into a single DC of category S. This involves two reductions, because laughed is combined with Bill first, and then Bill laughed is combined with Fred thinks. A shift-reduce parser allows only the two operations of shifting and reduction (though on occasion reduction may be possible in more than one way). Clearly, the state of the parser at any particular point in processing can be represented linearly, using brackets around the current DCs. First, John spoke to is as follows: 12. [John spoke tO]s/NP In contrast, Fred thinks Bill requires two pairs of brackets, corresponding to the two DCs on the stack: 13. [Fred thinks]s/s [Bill]Np

336

Pickering

Pickering (1993) demonstrated how this account can be used to explain the differential complexity of various kinds of recursive constructions (e.g., Chomsky, 1965). "Right branching" sentences do not require a deep stack or complex categories, whereas " n e s t e d " constructions do require a deep stack, or employ very complex categories. The account also deals with the unbounded dependency examples discussed by Pickering and Barry (1991), and eschews the use of empty categories in the analysis. One caveat should be made at this point. It is not necessarily the case that every word has to be incorporated into a DC immediately. In particular, it may be the case that function words need not be incorporated with the previous words into a single DC automatically. For example, in the man who Mar), loves, the word who may not be combined with the rna/z, but instead waits until Mary loves is encountered, because it associates more closely with the body of the relative clause. A similar example is the preposition in complex noun phrases like the book on the shelf. I propose that the processor does not normally link the book with on, but instead waits until on the shelf has been encountered in its entirety. These cases might increase processing load, because stack depth is increased, but the " w e i g h t " of having an essentially grammatical word on the stack on its own might be negligible. In addition, since they have little meaning on their own, the benefits of immediate combination for incremental interpretation would be very slight. The reason for making this suggestion will become apparent when considering the processing of certain ambiguities below. THE RESOLUTION OF AMBIGUITY This section presents the account of ambiguity resolution, known as

the principle of dependency formation. The principle claims that the processor establishes dependencies and forms dependency constituents whenever possible. This means that there are two fundamentally different types of syntactic ambiguity resolution. The rest of the section explores how the two types of ambiguity resolution can be used to explain psycholinguistic data. It argues that the processing of unbounded dependencies can be explained without need for any additional mechanisms. The principle of dependency formation (PDF) is applied after each word is encountered. It has two clauses: (i) Initially select the analysis which requires the postulation of fewer or fewest DCs, without considering any other analyses. (ii) If two or more analyses require the same number of DCs, construct the alternative analyses in parallel, and select between them in a manner making reference to any available information.


337

Clause (i) captures the intuition that the processor attempts to make interpretation as incremental as possible. In addition, it keeps memory load as low as it can, since the stack depth is minimised, and as much syntactic and semantic information as possible is " c h u n k e d " together. The clause makes reference to " f e w e r or fewest D C s , " since there will be occasions where the processor chooses two DCs instead of three (rather than one instead of two), as in the following example: 14. About what did you propose John would speak? When John is encountered, there are two DCs on the stack, corresponding to about what and did you propose. The choice is then between combining John with did you propose, or adding a third element to the stack. Clause (i) causes the former strategy to be adopted. (This is not ultimately correct, because propose takes a S complement here). In the vast majority of cases (in English), the choice is between forming a single DC and retaining two DCs. Clause (ii) implies that no automatic syntactic principle is applied in cases where both or all analyses involve the construction of the same number of DCs. This clause is rather vague, and the paper does not try to provide a complete theory of what factors determine choice of analysis in these cases. However, some suggestions will be made below. Note that a similar vagueness is found in all constraint-based theories (e.g., MacDonald, 1994; Spivey-Knowlton et al., 1993). In terms of dependency CG, the two clauses correspond to an important contrast. Clause (i) refers to cases where one analysis involves a reduction between the category of the new word and the category currently on top of the stack. Hence, clause (i) makes reference to shift-reduce ambiguities. In shift-reduce ambiguities, the processor reduces in preference to shifting, s In contrast, clause (ii) corresponds to the claim that no automatic syntactic preference applies in reduce-reduce ambiguities. Therefore, shift-reduce ambiguities differ from reduce-reduce ambiguities. Informally, these can be thought to be equivalent to attach-no attach ambiguities (it is unclear whether the new word or phrase should be attached or not) and attachattach ambiguities (the new word or phrase should be attached but there is more than one way to attach), respectively. I propose that the PDF is designed both to maximize the amount of incremental interpretation that is possible with a sentence fragment, and at 5 Strictly, clause (i) is slightly more general than this. It also implies that an analysis involving two reductions would be preferred over one involving a single reduction.

338

Pickering

the same time to make the likelihood of having to reanalyze subsequently as low as possible. However, on this model, the overriding objective is to maximize incremental interpretation. This means that the processor will on occasion initially adopt an analysis that is unlikely to be the correct analysis. For example, the account does not pay any attention to subcategorization preferences when deciding whether to treat an NP as an argument of a verb or not. This will be made clearer below when experimental data is discussed. With attach-attach ambiguities, let us simply assume that the processor chooses to pursue the more likely analysis, given all the information available to it at that point. All possible analyses are constructed in parallel, and the processor chooses between them: It selects the most likely analysis (given what it knows). The choice between analyses is conducted in the " p r o p o s e - d i s p o s e " terms employed by Crain and Steedman (1985) and Altmann and Steedman (1988): Grammar proposes, context disposes. In this model, however, all relevant information can be brought to bear, not simply referential information. This may well be a simplification, because a more frequent analysis may be activated very quickly, and contextual factors may force the choice of this analysis before any other analyses are activated (e.g., Spivey-Knowlton, et al., 1993). This paper does not address this particular distinction. Let us simply list some factors that may be more or less relevant in the resolution of attach-attach ambiguities. This will not be discussed further: (i) Plausibility of the different interpretations with respect to the meaning of the words (including selectional restriction information) (e.g., MacDonald, 1994; Spivey-Knowlton et al., 1993; Trueswell et al., 1994) (ii) Compatibility of referential expressions with the discourse model set up by the context (e.g., Altmann & Steedman, 1988; Crain & Steedman, 1985; Haddock, 1989; Ni & Crain, 1990). (iii) Frequency of use of the constructions on the different analyses (e.g., Ford et al., 1982; MacDonald, 1994). (iv) Facilitation of one analysis on the basis of prior processing constructions with the same (or perhaps similar) syntactic analysis. This could be based on immediate prior presentation (e.g., Branigan, Pickering, & Stewart, 1994; Frazier, Taft, Roeper, Clifton, & Ehrlich, 1984; Mehler & Carey, 1967). Alternatively, there could be a slow build-up of the preference for a more commonly used analysis (e.g., Mitchell & Cuetos, 1991); this would then be closely related to factor (iii) above. (v) Intonation (e.g., Beach, 1991; Marslen-Wilson, Tyler, Warren, Grenier, & Lee 1992).


339

(vi) Arguments may be preferentially bound to positions higher in a hierarchy, in terms of obliqueness of grammatical relations (Pollard & Sag, 1987, 1993), accessibility (e.g., Keenan & Comrie, 1977), or thematic roles (e.g., Jackendoff, 1972). (vii) There may be some preferences based on the distance between a new phrase and subcategorizers for that phrase. For example, an adverb may preferentially attach to the more recent verb in examples like J o h n said M a r y left y e s t e r d a y (cf., e.g., Frazier & Fodor, 1978). But it is very important to stress that, in examples such as this, any recency preference can only help adjudicate between analyses which are considered together. There is no automatic attachment to the recent verb, because both analyses require only a single dependency constituent. One further point to be made is that ambiguity with respect to the subcategorization frame that is assumed for a verb (or other word) is not treated as a genuine ambiguity before the relevant arguments are encountered. For instance, there is no need to make a choice about whether eats is used transitively or intransitively after encountering the fragment J o h n eats . . . . This is parallelism in a trivial sense, and would probably be adopted by most serial models of processing. Dependency CG does not employ any special mechanisms such as transformations, traces, or coindexation to deal with unbounded dependencies. Therefore, there is no reason to assume a separate account of such constructions in parsing, and the same principles are assumed to hold for all constructions. Hence this model contrasts with Clifton and Frazier's (1989) active filler strategy, as well as with accounts that postulate a coreference processor (Nicol & Swinney, 1989) or separate subroutines for dealing with unbounded dependencies (Wanner & Maratsos, 1978). This account of parsing is a serial model with a rather restricted delay component. Delay is only used when no dependency constituent can be formed. It is very important to note that delay is not the result of ambiguity, as it is many delay models (e.g., Weinberg, 1993; Marcus, 1980; see also Kennedy, Murray, Jennings, & Reid 1989). The processor delays because it cannot find an analysis at this point, not because it realizes that more than one possibility exists and it would be foolish to choose one too quickly. In this model, the processor never delays because it might be wrong, but only because it cannot produce a unified interpretation. ~ ~'There is an alternative model within this general approach to parsing which could be explored. Here, the processor does not employ any delay in syntax or in the associated semantic representation, but only dependency constituents can be integrated with general knowledge. The PDF would not be affected by this alternative, as the processor would still adopt the analysis consistent with the fewest Des.

340

Pickering

MODELING EXPERIMENTAL DATA

It is now time to show how the model can explain the processing of syntactically ambiguous constructions. I shall discuss shift-reduce ambiguities first, and then consider reduce-reduce ambiguities. Within each section, I shall make reference to both local and unbounded dependency constructions, as there is no reason to provide separate accounts of how they are processed. As individual construction types are discussed, I shall make some comparisons between this account and competing accounts, but a full comparison is not possible here. Shift-Reduce Ambiguities

The prediction for subordinate clause ambiguities such as in example 15 is that the ambiguous phrase is treated as the object of the subordinate verb: 15. Though John kept on reading the story still bothered him. The dependency diagram for the relevant fragment is given below:

16. Though John kept on reading the story This is incompatible with the correct analysis for the complete sentence, as the diagram below shows:

17. Though John kept on reading the story still bothered him. In terms of DCs, the processor prefers example 18a to example 18b below, because example 18a involves a single DC, and hence allows the fragment to be interpreted (as a whole): 18a. [Though John kept on reading the story]vrv 18b. [Though John kept on reading]~rv [the story]~v Similarly, the model predicts immediate incorporation of contributions into the subordinate clause in example 19 and immediate attachment of to touch the wire to in this race in example 20 below: 19. Without her contributions failed to come in.


341

20. In this race to touch the wire is to die. These predictions are borne out by Frazier (1979), Frazier and Rayner (1982), and subsequent work (e.g., Clifton, 1993; Ferreira & Henderson, 1991a; Pickering & Traxler, 1994). For instance, Frazier and Rayner found disruption in the region still bothered him in (15). It is more controversial whether disruption occurs if the subordinate verb is preferentially intransitive [see, e.g., Ferreira & Henderson (1991b), Frazier (1987), Trueswell et al. (1993) for discussion]. On the present account, subcategorization preferences are irrelevant to attach-no attach ambiguities. In this context, Mitchell (1987) found disruption even if the subordinate verb is normally considered to be intransitive (e.g., sneezed). Notice, however, that sneezed has a rare transitive use with a cognate object (e.g., John sneezed a big sneeze). There is of course no way to tell what the following NP will be before it is encountered. Hence the model predicts Mitchell's pattern of results. It realizes that sneezed is normally intransitive, but initially considers the possibility that it is being used transitively just in case. [See Trueswell et al. (1993) and Adams, Clifton, & Mitchell (1992) for alternative explanations of Mitchell's data.] As noted above, lexical preferences do not affect the predictions of the model. For object/complement ambiguities like example 21 below, the PDF predicts the object analysis will be chosen: 21. The father decided the punishment for stealing was too severe. Here, we contrast the dependency diagram for the complete sentence and the object analysis for the ambiguous fragment:

22. The father decided the punishment for stealing was too severe.

23. The father decided the punishment for stealing. The object analysis requires only a single DC after stealing (example 24a below), whereas the bare S complement analysis requires two DCs (example 24b below), so the object analysis is adopted: 24a. [The father decided the punishment for stealing]s 24b. [The father decided]s/s[the punishment for stealing]sp

342

Picketing

There is good evidence for some processing difficulty around was in sentences like 21 (Ferreira & Henderson, 1990; Frazier & Rayner, 1982; Pickering & Traxler, 1994; Rayner & Frazier, 1987; but cf. Holmes et al., 1987; Kennedy et al., 1989); many of these experiments contrast sentences like 21 with control sentences including the complementizer that. This finding is predicted by the PDF, because was indicates that the object analysis is in fact wrong. As lexical preferences cannot affect the initial choice of analysis, disruption is predicted (at some point) even if the main verb more commonly takes a bare complement than an NP object. This prediction is challenged by Trueswell et al. (1993), who argued that no difficulty is found with such verbs, but this issue is certainly extremely controversial (as it is with subordinate clause ambiguities like in example 15 above). This model analyses the processing of example 15 and 21 above in essentially the same way. This may seem strange, given that example 15 is intuitively more disruptive to process than example 21. Pritchett (1992) and Gorrell (in press) claim that only example 15 causes conscious disruption to processing. But any such difference is not relevant to the present model, which is only concerned with the initial choice of analysis. Both examples 15 and 21 appear to cause disruption to processing at the point of disambiguation, and the current model merely seeks to explain this finding. Many ambiguities in unbounded dependency constructions involve shift-reduce ambiguities. In example 3, repeated below as example 25, it predicts that the unbounded dependency is formed between w h o m and believe immediately after the verb is reached: 25. I know whom you believe John supports. This is because example 26a below is preferred to example 26b: 26a, [I know whom you believe]s 26. [I know whom]sas/Nv) [you believe]s/s In example 26b, y o u believe takes a bare complement (it could also take a that complement), which cannot be combined with I k n o w w h o m . The processor chooses example 26a because it consists of a single DC. In most cases, this makes identical predictions to the active filler strategy of Clifton and Frazier (1989), with the unbounded dependencies being formed as quickly as possible. Experimental evidence from plausibility manipulations (e.g., Garnsey, Tanenhaus, & Chapman, 1989; Tanenhaus, Stowe, & Carlson, 1985) and cross-modal priming (Nicol & Swinney, 1989; Swinney,


343

Ford, Frauenfelder, & Bresnan 1988) supports this account. The model also predicts disruption when the NP J o h n is encountered, because the sentence fragment now appears to be ungrammatical after the analysis given to I k n o w w h o m y o u believe. Hence filled-gap effects (Clifton & Frazier, 1989; Crain & Fodor, 1985; Stowe, 1986) are captured by the model. Tanenhaus, Garnsey, and Boland (1990) found that filled-gap effects are affected by the plausibility of the fragment, but their results can be explained on the assumption that the dependency can be formed and then undone in such cases before the next word is reached [Hickok, Canseco-Gonzalez, Zurif, & Grimshaw (1991) provided support for this from cross-modal priming]. This account can also model the data discussed by Pickering and Barry (1991), who provided evidence that the processor does not wait until the proposed " g a p " or " e m p t y c a t e g o r y " site before forming an unbounded dependency (see Pickering, 1993). In a sentence like 27 below, the purported empty category site is sentence final, after cup: 27. That's the saucer on which the man put the cup. This is because the " c a n o n i c a l " word order is tile m a n p u t the cup on which saucer. Pickering and Barry provided evidence (much of it from an analysis of nested constructions) that the unbounded dependency is in fact formed at p u t (see also Gibson & Hickok, 1993; Gorrell, 1993; Pickering, 1993). The current account allows the formation of a single DC when the verb p u t is reached: 28. [That's the saucer on which the man pUt]s/NP Importantly, the single D C is formed before the purported gap 10cation. Dependency CG does not employ wh-trace (or any other empty categories), but instead assumes that there is a direct association between the extracted element (on which) and its subcategorizer (the verb put). After encountering put, all the processor needs to do is to find a single postverbal NP. 7 Of course, this construction does not involve any local ambiguity. Reduce-Reduce Ambiguities

Traditionally, it was assumed that reduced relatives, like example 34 below, were automatically, given the main clause analysis initially (e.g. Bever, 1970; Frazier 1979): 7 A complication is found in examples where the missing NP is not on the periphery of the relative clause, as in that's the cup whict, the man put on the saucer. See Steedman (1987), Moortgat (1988), and Pickering (1991) for discussion.

344

Picketing

29. The defendant examined by the lawyer turned out to be unreliable. Nonsynlactic information such as plausibility should not affect this process. Ferreira and Clifton (1986) found evidence for this: Plausibility did not affect whether a garden path occurred on the disambiguating phrase by the lm~3,er. This suggested that some aspects of interpretation are very quick, but that semantic information played no role in the initial selection of the analysis. However, Trueswell, et al., (1994) found contrary evidence with new materials, suggesting that the garden path could be avoided (see also Burgess, 1991; Burgess & Tanenhaus, 1992; MacDonald, 1994). A particularly clear contrast is found between Trueswell et al.'s (1994) data and that of Mitchell (1987) (and Adams et al., 1992). Trueswell et al. showed that semantic information can override the assumption that the main clause analysis should be chosen instead of the reduced relative analysis. Mitchell suggested that the assmnption that an NP should be treated as the object of a preceding verb cannot be overridden, however unlikely the association. This difference is directly reflected in the PDF, because only Trueswell et al.'s case involved a reduce-reduce ambiguity. The next case to consider concern the attachment modifiers. In these cases, the suggestion that there may be some delay in forming DCs becomes important, as we shall see. Consider sentence 30 below: 30. The girl hit the boy with the book. According to minimal attachment, the ambiguous PP should be automatically attached high. Rayner et al. (1983) found evidence from eye-tracking which was taken to support this proposal. However, Taraban and McClelland (1988) found contradictory evidence with materials identical in the relevant syntactic respects. This suggests that no automatic principle applies in such instances, but rather that a number of factors may affect the initial resolution process. All other things being equal, readers may have a bias toward high attachment, but that bias can be overridden. The PDF classifies example 30 as involving a reduce-reduce ambiguity, hence making what appears to be the correct prediction. There is, however, a complication. The string the girl hit the boy is a DC under either interpretation, so the PDF claims that the decision should be made at that point. Although this is possible, there is little information available to aid a decision at this point. Information about frequency of construction might help, as could referential information (Altmann & Steedman, 1988); for instance, low attachment would be preferred if more than one boy had been introduced

Processing Local and U n b o u n d e d D e p e n d e n c i e s

345

by the context. But the semantic information found in the final NP is not present yet. I suggest that the parser delays its decision until it encounters this final NP. Function words introducing a new phrase or clause are not immediately associated with the higher clause, but must instead be combined with the new unit first. This means that category PP/NP, for with, is placed on the stack and is not combined with S/PP immediately (S/PP is identified with two separate semantic representations corresponding to the two readings). The intuitive justification for this proposal is that it is rare for examples like 30 to engender a marked garden path effect; this would quite regularly happen if the decision were made immediately after the preposition was processed. In contrast, cases like example 31 below appear to cause marked garden path effects. Here, the crucial difference is that the conflicting information comes after the NP has been processed: 31. The spy saw the cop with the binoculars with the telescope. In this case, with the binoculars is attached to saw, but the appearance of with the telescope forces this to be reappraised. The use of a limited delay component in processing has been advocated on many occasions, notably by Pcrfetti (1990), whose account bears some similarity to the current model in this respect. Very similar considerations apply to ambiguities about complex NP modification, such as example 2, repeated below: 32. The journalist interviewed the daughter of the colonel who had had an accident. The evidence suggests that no automatic syntactic principle applies in such cases. This conflicts with late closure, which predicts that the modifier attaches low, to colonel. To summarize briefly, Cuetos and Mitchell (1988) found high attachment preferences in Spanish, and Carreiras and Clifton (in press) found no low attachment preference even in English. The PDF treats this as a reduce-reduce ambiguity, with the question being whether to form a single DC with who the dependent of daughter or of colonel. Again, I suggest that the parser normally delays its decision until after it encounters the relative pronoun, probably until the verb is reached. Note that the same prediction is made with adverbial attachment ambiguities like example 33 below: 33. John said Bill died yesterday.

346

Pickering

In this example, however, there is a clear preference to attach yesterday to the most recent verb died (e.g., Frazier 1979; Frazier & Fodor, 1978), in contrast to example 32. On this model, the recency preference is explained as a bias involved in the resolution of an attach-attach conflict, as discussed above. This predicts that the recency preference could in principle be overridden by enough support for the another analysis, in contrast to examples 15 and 21 above. This would obviously require careful empirical investigation. The local ambiguity in cases of coordination like example 34 may be a special case: 34. I saw the girl and her sister laughed. The empirical evidence is extremely limited, and it is not clear how general the low attachment preference found by Frazier (1978) is. The PDF does not have anything to say about such examples. The reason is that this theory does not deal with the incremental processing of coordinate constructions. For example, consider example 35 below: 35. John loves Mary and Sue. The expression John loves Main is not a DC in this sentence, but it would of course appear to be one after Maty is reached. On an obvious account, the processor would have to backtrack when Sue is reached. Such a model would clearly not be particularly incremental and would require considerable backtracking. It would appear to be preferable to construct an account that allowed Sue to be treated as conjoined with Mmy without backtracking, but this is not possible within dependency CG as currently formulated [see Pickering & Barry (1993) for the grammatical account]. There are some attach-attach ambiguities involving unbounded dependencies. In some cases, the filler can be associated with more than one argument slot of the verb, as in example 36 below: 36. Which boy did you show the girl? There are two possible DCs corresponding to the fragment which boy did you show, and hence there is no automatic way to choose an analysis. In this case, there does appear to be a preference for the reading which involves showing the girl a boy (Fodor, 1978). One possibility is that there is a preference for associating the filler with the direct object position in preference to the indirect object position, since direct object is less oblique

Processing Local and U n b o u n d e d D e p e n d e n c i e s

347

(Pollard & Sag, 1987) and more accessible (Keenan & Comrie, 1977). As Pickering (1993) pointed out, this finding is not compatible with the active filler strategy (Clifton & Frazier, 1989), since the preferred reading is the one with the late gap. In other cases, there is an ambiguity between an unbounded dependency analysis and an alternative that involves local dependencies alone. This is found in complement clause/relative clause ambiguities like that in example 37: 37. The receptionist informed the doctor that the journalist had phoned about the events. The model predicts a reduce-reduce conflict taking place at had phoned, assuming that the processor does not require a decision to be made on encountering the function word that (as in the analysis of example 35 above). This is compatible with the conclusions of Crain and Steedman (1985) and Altmann, Garnham, and Dennis (1992), who suggested that the ambiguity was not resolved according to a simple syntactic principle. They argued that the processor would adopt the analysis which was more felicitous in terms of the discourse context (though they assumed that the decision was made by the time that that had been processed). In addition, Nicol and Pickering (1993) and Hickok (1993) found evidence from priming that the relative clause reading was active at the offset of phoped. Hickok also found that the complement clause reading was adopted for the complete sentence in an off-line task. Nicol and Pickering suggested that no analysis has been chosen before this point, but instead the processor constructs analyses for the fragments the receptionist informed the doctor and the journalist. In current terms, the decision is delayed because the two fragments do not form a single DC on either analysis. When had photzed is reached, a single analysis can be formed in two different ways. The choice is made at this point, and the complement clause analysis is normally chosen (at least in isolation), but the relative clause analysis is active after photted, while the decision is being made. Notice, however, that it is necessary that the complementizer that is not attached to the main clause fragment immediately, or it would force disambiguation to one reading or the other; this attachment has to be delayed. On this account, priming can be used to detect the momentary consideration of a reading which is subsequently dropped. It is not clear how these results can be explained in any standard serial framework, including the garden path model (Frazier, 1979) or the interactive model of Crain and Steedman (1985) and Altmann and Steedman (1988).

348

Pickering

CONCLUSION I have argued that the processor attempts to interpret what it e n c o u n t e r s as quickly as possible (with the exception of s o m e possible delay with function words). In terms of the theory of d e p e n d e n c y categorial g r a m m a r , this m e a n s that the processor forms d e p e n d e n c y constituents w h e n e v e r it can. This provides an automatic procedure for the initial resolution of some syntactic ambiguities. H o w e v e r , other a m b i g u i t i e s i n v o l v e a choice b e t w e e n two analyses which both result in a single d e p e n d e n c y constituent. In these cases, i n c r e m e n t a l interpretation is possible u n d e r either analysis, and so the processor constructs both a n a l y s e s and decides b e t w e e n them in an interactive m a n n e r . This produces a straightforward d i c h o t o m y b e t w e e n the way in which two classes of c o n s t r u c t i o n s are processed. It appears that there is a c o n s i d e r a b l e a m o u n t of e x p e r i m e n t a l e v i d e n c e in support of this distinction.

REFERENCES Abncy, S. (1989). A computational model of human parsing. Journal of Psycholinguistic Research, 18, 129-144. Adams, B., Clifton, C., & Mitchell, D. (1992). Lcxical guidance in sentence processing: Further support for a filtering account. Paper presented at the Psychonomics Society Meeting St. Louis. Ades, A., & Steedman, M. J. (1982). On the order of words. Linguistics and Philosophy, 4, 517-588. Ajdukicwicz, K. (1935). Die syntakische Konnexitfit. Studia Philosophia, 1, 1-27. (Translated as Syntactic connection, in S. McCall, (Ed.), Polish Logic: 1920-1939, pp. 207-231). Oxford, England: Oxford University Press). Altmann, G. T. M., Garnham, A., & Dennis, Y. I. L. (1992). Avoiding the garden path: Eye-movements in context. Journal of Memory and Language, 31, 685-712. Altmann, G. T. M., & Steedman, M. J. (1988). Interaction with context during human sentence processing. Cognition, 30, 191-238. Bar-Hillcl, Y. (1953). A quasi-arithmetical notation for syntactic description. Language, 29, 47-58. Barry, G. (1992). Derivation arm structure in categorial grammar. Unpublished Ph.D. thcsis, Univcrsity of Edinburgh. Barry, G., & Picketing, M. (1990). Dependency and constituency in categorial grammar. In G. Barry & G. Morrill (cds.), Edinburgh working papers in cognitive science." Vol. 5. Studies in categorial grammar, Edinburgh Centre for Cognitive Science, University of Edinburgh. Beach, C. (1991). The interpretation of prosodic patterns at points of syntactic structure ambiguity: Evidence for cue trading relations. Jourtlal of 3Ic'mol 3' and Language, 30, 644--663. Bevcr, T. G. (1970). The cognitive basis for linguistic structures. In J. R. Hayes (Ed.), Cognition and the development of language. New York: Wiley. Bloomfield, L. (1933). Language. New York: Holt, Reinhart and Winston.


349

Branigan, H., Picketing, M. J., & Stewart, A. (1994). Syntactic priming in language comprehension. Poster presented at The Seventh CUNY Sentence Processing Conference, New York, March 1994. Burgess, C. (1991). Interaction of semantic, syntactic, and visual factors in syntactic ambiguity resolution. Unpublishcd doctoral dissertation. University of Rochester, Rochester, NY. Burgess, C., & Tanenhaus, M. (1992). The interaction of semantic and parafoveal information in syntactic ambiguity resolution. Unpublished manuscript. Carreiras, M., & Clifton, C. (in press). Relative clause interpretation preferences in Spanish and English_ Language and Speech. Chater, N., Pickering, M., & Milward, D. (1994). What is incremental interpretation? Unpublished Manuscript. Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA.: MIT Press. Clifton, C. (1993). Thematic roles in sentence parsing. Canadian Journal of Experimental Psychology, 47, 222-246. Clifton, C., & Frazier, L. (1989). Comprehending sentences with long distance dependencics. In G. Carlson and M. Tanenhaus, (Eds.), Linguistic structure in language processing. Dordrecht: Kluwer. Crain, S., & Fodor, J. D. (1985). How can grammars help parsers? In D. Dowty, L. Karttunen, and A. Zwicky (Eds.), Natural language parsing. Cambridge, England: Cambridge University Press. Crain, S., & Steedman, M. J. (1985). On not being led up the garden path: The use of contcxt by the psychological parser. In D. Dowry, L. Karttuncn, & A. Zwicky (Eds.), Natural language parsing. Cambridge, England: Cambridge University Press. Cuetos, F., & Mitchell, D. (1988). Cross-linguistic differences in attachment preferences: Restrictions on the use of the late closure strategy in Spanish. Cognition, 30, 7 3 105. Ferrcira, F., & Clifton, C. (1986). The independence of syntactic processing. Journal of 3lemory and Language, 25, 348-368. Ferrcira, F., & Henderson, J. (1990). The use of vcrb information in syntactic parsing: A comparison of evidence from eye movements and word-by-word self-paced reading. Journal of Experimental Psyct, ology: Learning, Memory and Cognition, 16, 555-568. Fcrreira, F., & Henderson, J. (1991a). Recovery from misanalyses of garden-path sentences. Journal of Memory and Language, 30, 725-745. Fcrreira, F., & Henderson, J. (1991b). How is verb information used during syntactic parsing? In G. B. Simpson (Ed.), Understanding word and sentence. Amsterdam, Holland: North-Holland. Fodor, J. D. (1978). Parsing strategics and constraints on transformations. Linguistic hzquiry, 9, 427--473. Ford, M., Brcsnan, J. W., & Kaplan, R. M. (1982). A competence based theory of syntactic closure. In J. D. Bresnan (Ed.). The Met, tal Representation ofgran,matical relations. Cambridge, MA: MIT Press. Frazier, L. (1979). On comprehending sentences: Syntactic parsing strategies. Bloomington: Indiana University Linguistics Club. Frazier, L. (1987). Sentence processing. In M. Coltheart (Ed.), Attention and Performance ,VII. Hillsdale, NJ: Erlbaum. Frazier, L., & Fodor, J. D. (1978). The sausage machine: A new two-stage parsing model. Cognition, 6, 291-325.

350

Pickering

Frazier, L., & Rayner, K. (1982). Making and correcting errors during scntence comprehension: Eye movements in the analysis of structurally anabiguous sentences. Cognitive Ps~,chology 14, 178-210. Frazier, L., Taft, L., Roeper, T., Clifton, C., & Ehrlich, K. (1984) Parallel structure: A source of facilitation in sentence comprehension. Memory & Cognition, 12, 421--430. Gaifman, H. (1965). Dependency systems and phrase structure systems, bzformation and Control, 8, 304-337. Garnscy, S. M., Tancnhaus, M. K., & Chapman, R. M. (1989). Evoked potentials and the study of sentence comprehension. Journal of Psycholinguistic Research, 18, 51-60. Gibson, E. (1991). A computational theory of linguistic processing: memory limitations and processing breakdown. Unpublished doctoral dissociation, Carnegie Mellon University, Pittsburgh, PA. Gibson, E., & Hickok, G. (1993). Sentence processing with empty categories. Language ap,d Cognitive Processes, 8, 147-161. Gorrell, P. (1993). Evaluating the direct association hypothesis: A reply to Pickering and Barry (1991). Language and Cognitive Processes, 8, 199-146. Gorrell, P. (in press). Syntax and perception. Cambridge, England: Cambridge University Press. Haddock, N. (1989). Computational models of incremental semantic interpretation. Language and Cognitive Processes, 4, 337-368. Hays, D. (1964). Dependency grammar: A formalism and some observations. Language, 40, 511-525. Hickok, G. (1993). Parallel parsing: Evidence from reactivation in garden path sentences. Journal of Psycholinguistic Research, 22, 239-250. Hickok, G., Canseco-Gonzalez, E., Zurif, E., & Grimshaw, J. (1991). Modularity in locating gaps. Poster presented at 1991 CUNY Conference, Rochester, NY. Holmes, V. M., Kennedy, A., & Murray, W. (1987). Syntactic structure and the garden path. Quarterly Journal of Experimental Psychology, 39A, 277-294. Hudson, R. A. (1984). Word grammar. Oxford, England: Basil Blackwcll. Hudson, R. A. (1990). English word grammar. Oxford, England: Basil Blackwcll. Jackendoff, R. (1972). Semantic interpretation in generative grammar. Cambridge, MA: MIT Press. Jackendoff, R. (1977). X' syntax: A study of phrase structure. Cambridge, MA: MIT Press. Kecnan, E., & Comrie, B. (1977). Noun phrase accessibility and universal grammar. Linguistic Inquiry, 8, 63-100. Kennedy, A., Murray, W., Jennings, F., & Reid, C. (1989). Parsing complements: Comments on the generality of the principle of minimal attachment. Language and Cognitive Processes, 4, 51-76. Kimball, J. (1973). Seven principles of surface structure parsing in natural language. Cognition, 2, 15-47. Lambek, J. (1958). The mathematics of sentence structure. American Mathematical Monthly, 65, 154-170. MacDonald, M. C. (1994). Probabilistic constraints and syntactic ambiguity resolution. Language and Cogr, itive Processes, 9, 157-201. MacDonald, M. C., Pearlmutter, N., & Seidenberg, M. (1993). The lexical nature of syntactic ambiguity resolution. (Beckman Institute Cognitive Science Technical Report UIUC-CI-CS-93-13). Urbana: University of Illinois. Marcus, M. (1980). A theory of syntactic recognition for natural language. Cambridge, MA: M I T Press.

Processing Local a n d U n b o u n d e d D e p e n d e n c i e s

351

Marslen-Wilson, W. D. (1973). Linguistic structure and speech shadowing at very short latencics. Nature, 244, 522-523. Marslcn-Wilson, W. D. (1975). Scntence perception as an interactive parallel process. Science, 189, 226-228. Marslcn-Wilson, W. D., Tyler, L. K., Warren, P., Grenier, P., & Lee, C. S. (1992). Prosodic effects in minimal attachment. Quarterly Journal of Experimental Psychology, 45,4, 73~q7. Matthcws, P. (1981). Synta_v. Cambridge, England: Cambridge University Press. Mchlcr, J., & Carey, P. (1967). Role of surface and base structure in the perception of sentences. Journal of Verbal Learning and Verbal Behavior, 6, 335-338. Mitchell, D. C. (1987). Lexical guidance in human parsing: Locus and processing characteristics. In M. Coltheart (Ed.), Attention and performance XII. Hillsdale, NJ: Erlbaum. Mitchell, D., & Cuetos, F. (1991). The origins of parsing strategies. Unpublished manuscript. Moortgat, M. (1988). Categorial investigations: Logical and linguistic aspects of the lambek calculus. Dordrecht: Foris. Ni, W., & Crain, S. (1990). How to resolve structural ambiguities. Proceedings of the 20th Meeting of the North Eastern Linguistics Society. Pittsburgh. NicoI, J., & Pickering, M. (1993). Processing syntactically ambiguous sentences: Evidence from semantic priming. Journal of Psycholinguistic Research. 22, 2 0 7 237. NicoI, J., & Swinney, D. (1989). The role of structure in coreference assignment during sentence comprehension. Journal of Pyscholinguistic Research, 18, 5-19. Pcrfctti, C. (1990). The cooperative language processors: Semantic influences in an autonomous syntax. In D. A. Balota, G. B. Flores d'Arcais, and K. Rayner (Eds.), Comprehension processes in reading. (pp. 205-230. Hillsdale, N J: Erlbaum. Pickcring, M. J. (1991). Processing dependencies. Unpublished Ph.D. thesis, University of Edinburgh. Pickcring, M. J. (1993). Direct association and sentence processing: A reply to Gorrcll and to Gibson and Hickok. Language and Cognitive Processes, 8, 163-196. Pickcring, M. J., & Barry, G. (1991). Sentence processing without empty categories. Language and Cogt, itive Processes, 6, 229-259. Pickering, M. J., & Barry, G. (1993). Dependency categorial grammar and coordination. Linguistics, 31, 855-902. Pickcring, M. J., & Traxler, M. (1994). Plausibility and recovery from garden paths. Manuscript submitted for publication. Pollard, C., & Sag, I. A. (1987). Information-based syntax and semantics. Stanford, CA.: CSLI. Pollard, C., & Sag, I. A. (1993). Head-driven phrase structure grammar. Stanford, CA, and Chicago: CSLI and University of Chicago Press. Pritchett, B. (1988). Garden path phenomena and the grammatical basis of language processing. Language, 64, 539-576. Pritchett, B. (1992). Grammatical competence and parsing performance. Chicago: University of Chicago Press. Pulman, S. G. (1986). Grammars, parsers and memory limitations. Language and Cognitive Processes, 1, 197-225. Rayner, K., Carlson, M., & Frazicr, L. (1983). The interaction of syntax and semantics during sentence processing: Eye movements in the analysis of semantically biassed sentences. Journal of Verbal Learning and Verbal Behavior, 22, 358--374.

352

Picketing

Rayncr, K., & Frazier, L. (1987). Parsing temporarily ambiguous complements. Quarterly Journal of Experimental Psychology, 39A, 657~573. Sheiber, S., & Johnson, M. (1993). Variations on incremental interpretation. Journal of Psvcholinguistic Research, 22, 287-318. Spivcy-Knowlton, M., Trueswell, J., & Tanenhaus, M. (1993). Context and syntactic ambiguity resolution. Canadian Journal of Experimental Psychology, 47, 276-309. 5fabler, E., (1991). Avoid the pedestrian's paradox. In R. Berwick, S. Abney, & C. Tenny, (Eds.), Principle-based parsing: Computation and psycholinguistics. Dordrecht: Kluwer Academic. Stcedman, M. J. (1987). Combinatory grammars and parasitic gaps. Natural Language and Linguistic Theory, 5, 403-439. Stecdman, M. J. (1989). Grammar, interpretation and processing from the lexicon. In W. Marslcn-Wilson (Ed.). Lexical representation and process. Cambridge, MA: MIT Press. Steedman, M. J. (1992). Grammars and processors. (Tech. Rep. MS-CIS-92-52). PhiladeIphia: Department of Computer and Information Science, University of Pennsylvania. Sto,,ve, L. A. (1986). Parsing WH-constructions: Evidence for on-line gap location. Language and Cognitive Processes, 1, 227-245. Stowe, L. A. (1989). Thematic roles and sentence comprehension. In G. N. Carlson & M. K. Tanenhaus (Eds.), Linguistic structure in language processing. Dordrecht: Kiu,,ver. Swinney, D., Ford, M., Frauenfelder, U., & Bresnan, J. (1988). On the temporal course of gap-filling and antecedent assignmet, t during sentence comprehension. Unpublished manuscript. Tanenhaus, M., Garsney, S., & Boland, J. (1990). Combinatory lexical information and language comprehension. In G. T. M. Altmann (Ed.), Cognitive models of speecl, processing. Cambridge, MA: MIT Press. Tanenhaus, M., Stowe, L., & Carlson, G. (1985). The interaction of lexical expectations and pragmatics in parsing filler-gap constructions. In Proceedings of the Seventh Annual Cognitive Science Society 3Ieetings. Hillsdale, NJ: Lawrence Erlbaum Associates. Taraban, R., & McClelland, J. L. (1988). Constituent attachment and thematic role assignment in sentence processing: Influence of content-based expectations. Journal of ]tIemory and Language, 27, 597-632. Tesni~re, L. (1959). Elements de syntaxe structurale. Paris: Klincksieck. Trucswell, J., Tanenhaus, M., & Garnsey, S. (1994). Semantic influences on parsing: Use of thematic role information in syntactic disambiguation. Journal of Memory and Language, 33, 285-318. Trueswell, J., Tanenhaus, M., & Kello, C. (1993). Verb-specific constraints in sentence processing: Separating effects of lexical preference from garden-paths. Journal of Experimental Psychology: Learning, ~Iemor3, attd Cognition, 19, 528-553. Tyler, L., & Marslen-Wilson, W. (1977). The on-line effects of semantic context on syntactic processing. Journal of Verbal Learning and Verbal Behavior, 16, 6 4 5 660. Wanner, E., & Maratsos, M. (1978). A N A T N approach to comprehension. In M. Halle, J. Bresnan, & G. A. Miller (Eds.), Linguistic theory andp9 reality. Cambridge, MA: MIT Press. Weinbcrg, A. (1993). Minimal commitment: A parsing theory for the nineties. Journal of Psycholinguistic Research, 22, 339-364.

Processing local and unbounded dependencies: A unified account ...

Processing local and unbounded dependencies: A unified account ...

Suggest Documents

Evaluation of Dependency Parsers on Unbounded Dependencies

A Spatial Frequency Account of the Detriment That Local Processing ...

A Unified Framework for Bounded and Unbounded ... - John Opfer

NONLINEAR IMAGE PROCESSING AND FILTERING: A UNIFIED ...

Local Account - Healthwatch Bucks

A unified account of nominal distributivity, for-adverbials, and ... - NYU

A Unified Account of Perceptual Layering and ... - Semantic Scholar

Cross-linguistic Variation in a Processing Account

A processing account of weak crossover Carl

A hybrid meshless local PetrovâGalerkin method for unbounded

Processing multiple gap dependencies: Forewarned ... - William & Mary

Learning non-local dependencies - Brunel University London

Email Account Application - Oakland Unified School District

Email Account Application - Oakland Unified School District

Local Account - Buckinghamshire County Council

Local Account - Buckinghamshire County Council

Local dependencies in random fields via a Bonferroni-type inequality

Relabeling Heads: A Unified Account for Relativization Structures

A Unified Account of the Properties of German ... - Patrick Grosz

A unified account of polysemy within LCCM Theory

A Unified Account of Gaze Following - Cognitive Development Lab

A Unified Account of Prominence Effects in an ... - PUB Bielefeld

A Unified Empirical Account of Responsibility Judgments ... - PhilArchive

Towards a Unified Account of the Force Triggering ...

Processing local and unbounded dependencies: A unified account ...