Martin Kay. XEROX PARC ..... bone" and "He gave Fido to Mary". The .... to Mary". The implications for reversible syntactic processing seem to be as follows:.
~YNTACTIC P R O C E S S I N G AND ~UNCTIONAL SENTENCE PERSPECTIVE
the claims of those who deny this. But it is altogether i m p l a u s i b l e that we have two such things, one for parsing and one for generation, essentially unrelated to one another.
Martin Kay XEROX PARC This paper contains some ideas that nave occurred to me in the course of some preliminary work on the notion of r e v e r s i b l e grammar. In order to make it possible to generate and analyze sentences with the same grammar, r e p r e s e n t e d in the same way, I was led to consider restrictions on the expressive power of the f o r m a l i s m that would be acceptible only if the structures of sentences contained more i n f o r m a t i o n than Ameri c a n linguisits have usually been prepared to admit. I hope to convey to you some of my surprise and delight in finding that certain linguists of the Prague school argue for the r e p r e s e n t a t i o n of this same infor m a t i o n on a l t o g e t h e r different grounds.
I.
REVERSIBLE
A l e f t - t o - r i g h t g e n e r a t o r is one that deposits words into the output string one after another, in l e f t - t o - r i g h t sequence. A left-to-right parser is one that e x a m i n e s the words of the input string one after another, from left to right. A l e f t - t o - r i g h t r e v e r s i b l e grammar is one for w h i c h there is a l e f t - t o - r i g h t g e n e r a t o r and a l e f t - t o - r i g h t parser. Once again, it is clear that context-free grammars, in the usual notation, meet the r e q u i r e m e n t s . ATN g r a m m a r s p r o b a b l y do not. They c e r t a i n l y do not if we exclude the possibility of entirely reworking the grammar and p r e s e n t i n g it in an e n t i r e l y new form to the generator. The kind of grammar I have in mind would require no such reworking. Intuitively, the notion is a simple one. It is very like an ATN in that it analyses sentences by moving through a network, examining the input string at each transition and, if the current symbol meets the c o n d i t i o n s s p e c i f i e d for the transition, assigning it to a register. The g e n e r a t o r w o u l d also move through the network, making e x a c t l y the same transitions, but d e p o s i t i n g the c o n t e n t s of r e g i s t e r s into the string at each step.
GRAMMARS
For me, a grammar is not so much a set of w e l l - f o r m e d n e s s c o n d i t i o n s on strings of words as a relation between strings of words (sentences) and structures. U s u a l l y it is c o n s t r u c t i v e in the sense that there is a device that interprets it to convert either structures into sentences (a generator) or sentences into structures (a parser). A grammar for which both a generator and a parser can be found is what I call reversible and, of course, what I am interested in is not so much p a r t i c u l a r r e v e r s i b l e grammars as a formalism which guarantees reversibility for any grammar that adheres to it. Context-free grammars are clearly reversible in this sense and t r a n s f o r m a t i o n a l grammars are clearly not. Augmented Transition Networks (ATNs) are reversible provided some quite reasonable restrictions are placed on the o p e r a t i o n s that can be performed on registers. This is not to say that it is a trivial matter to obtain a g e n e r a t o r from an arbitrary ATN grammar.
2.
THE P R O C E S S O R
Generators and parsers for the kind of reversible grammar I have in mind could be i m p l e m e n t e d in a great v a r i e t y of ways. One of the simplest I know w o u l d be to use a version of the General Syntactic Processor (GSP). GSP contains: (I)
It goes w i t h o u t saying that the composition of a generator and a parser i n t e r p r e t i n g the same r e v e r s i b l e grammar on the same sentence or structure does not, in general, perform the identity transformation. Frequently, s e n t e n c e s are a m b i g u o u s and structures often c o r r e s p o n d to sets of more than one paraphrase. Parsing followed by generation of each member of the resulting set of structures will therefore yield the gramm a t i c a l paraphrases of the original under all g r a m m a t i c a l i n t e r p r e t a t i o n s .
(2) (3)
The practical advantages of a reversible grammar are obvious. A computer program that engages in any kind of conversation must both parse and generate and it would be economical to base both processes on the same grammar. But, to me, the t h e o r e t i c a l appeal is stronger. It is plausible that we have something in our heads that fills the function I am a s c r i b i n g to grammar, though I am not i n s e n s i t i v e to
(4)
(5)
or
12.
a grammar in the form of a transition network, that is, a airected graph in which the wermissible transitions between states are r e p r e s e n t e d by arcs, each labeled with a more or less complicated set of actions which eetermine the a p p l i c a b i l i t y of the arc and cause side effects, such as the assignment of values to re~isters, an a g e n d a of tasks to be carried out, a chart, that is, a directed graph consisting of ¥ertiees and edges w h i c h r e p r e s e n t s the s e n t e n c e being analyzed or g e n e r a t e d t o g e t h e r with 1~s component parts--phrases, In~ermediate derivations, or whatever--which, together with the agenda, c o m p l e t e l y e n c a p s u l a t e s the state of the entire p r o c e s s o r at any glven point in its operation, a set of s c h e d u l i n ~ rules w h o s e job is to d e t e r m i n e the order in w h i c h the tasks o n the agenda will be c a r r i e d out, ana the i n t e r p r e t e r ±~self.
Edges in the chart are either incomplete. Complete edges
complete represent
euges, one of which is incomplete. The idea is t o attempt to take the c o m p l e t e edge at least one step nearer to completion by incorporating the other edge. If only one edge is specified, then the new edge will have the same end points as the original, but will p r e s u m a b l y be d i f f e r e n t l y labeled.
c o m p l e t e l y specified words or phrases and, if there is a path through the chart from one edge to another, it is because the f i r s t precedes the second in temporal, or left-to-right, sequence. If there is no path from one to another, then they belong to alternative hypotheses about the structure of of the sentence. So, the sentence "they are flying planes" has, let us say, two analyses, each c o n s i s t i n g of a noun phrase followed by a verb phrase. But there is no path between the phrases in one a n a l y s i s and those in the other. The verb phrase in one analysis consists of the verb "are" followed by the noun phrase "flying planes", which are therefore adjacent on the same path, but there is no path from either of them to the verb phrase they make up because this is an alternative analysis of the same set of words.
Within this version of GSP, t o p - t o - b o t t o m , l e f t - t o - r i g h t parsing, in the manner of an ATN parser, proceeds broadly as follows: I. Whenever, as the result of i n t r o d u c i n g a new edge into the chart, a new sequence consisting of an incomplete eage followed by a complete one comes into existance, put a new task on the agenda for each of the "category" arcs named on the incomplete edge. When one of these tasks is executed, the arc will be applied to the complete e e giving rise, if successful, to a new e d g e ~ complete or incomplete. 2. W h e n e v e r a new incomplete edge is Introduced that names a "Pop" or "Jump" arc, create tasks that will cause these to be carried out. 3. Place an empty sentence edge before the first word of the sentence. The process starts with step 3, which immediately causes a sequence of instances u~ steps I and 2.
An i n c o m p l e t e edge r e p r e s e n t s part of a phrase together with an i n d i c a t i o n o f what would have to be added to complete it. For example, an incomplete noun phrase might include the string "the big black" plus an indication that a f o l l o w i n g noun, possibly preceded by some more adjectives, would complete it. A special kind of incomplete edge is an empty edge. Empty edges are successors of themselves. In other words, they are always incident from and to the same vertex, reflecting the fact that they represent no part of the sentence, but merely the potential for finding some s t r u c t u r a l component. The s p e c i f i c a t i o n of how an incomplete edge can be completed takes the form of one or more arcs in the grammar, each paired with a d i r e c t i o n - - l e f t or right. If the direction is right, then completion can be a c h i e v e d by following s e q u e n c e s of arcs incident fro~ the given state; if it is left, then arcs incident to the state must be followed.
An i n c o m p l e t e edge r e p r e s e n t s a stack frame in the ATN processor. It is labeled with a set of registers and it spans a portion of the chart r e p r e s e n t i n g the part of the string so far analyzed. "Category" arcs are applied to an incomplete edge and a c o m p l e t e edge i m m e d i a t e l y to its right. If successful, the result is a new i n c o m p l e t e edge. "Pop" arcs produce a complete edge, in exchange for an incomplete one. "Jump" arcs produce an incomplete edge in exchange for an incomplete edge, the differences being in the arcs that specify how to proceed towards completion, and possibly in the l a b e l .
The interpreter uses the scheduling rules to chose an item on the agenda which it then carries out. If all the tasks that were ever put on the agenda in the course of g e n e r a t i o n or analysis were carried out in an arbitrary order, then all results that the grammar allowed would be o b t a i n e d sooner or later. The scheduling rules formalize s t r a t e g i e ~ of one kind and another. They are p r e s u m a b l y designed so as to shorten the time r e q u i r e d to reach a result which is, in some sense, acceptable, at which time the r e m a i n i n g entries on the agenda can simply be abandoned.
It turns out that the mechanism of Fecursive calls that "Push" and "Pop" arcs provide for is embraced by the devices already described. Suppose that sentences are to be analyzed as c o n s i s t i n g of a noun phrase followed by a verb phrase and that a noun phrase, say "the big black dog" has somehow been recognized at the beginning of the sentence. This means that there will be a complete edge representing this noun phrase and an incomplete sentence edge which has the same end points with an arc specifying that a verb phrase with a singular, third-person verb, is to follow. Since the grammar contains a subnetwork giving the structure of verb phrases, an empty edge labeled with the category "verb phrase" is introduced following that i n c o m p l e t e sentence provided there is not one already there. In due course, this will p r e s u m a b l y cause a complete verb phrase to appear. The general principle is this: whenever an incomplete edge specifies a "category" arc for wnlch ~nere is a c o r r e s p o n d i n g subnetwork, an empty edge is created following that one for each of the
The typical task is an attempt to apply an arc from the grammar to an edge in the chart. If the arc applies successfully, ~ome new m a t e r i a l will be added to the chart. In generation, the new m a t e r i a l typically consists of one or two new edges c o n s t i t u t i n g a sequence with the same end points as those of the initial edge. usually, no more than one of the newly introduced edges will be incomplete. Thus, there might be a task in which an arc was a p p l i e d to the n o u n - p h r a s e edge r e p r e s e n t i n g "the big black dog" and which resulted in the complete article "the" and the i n c o m p l e t e noun phrase "big black dog". In parsing, the task specifies one or two i3.
or parsing. advocate.
initial arcs in the subnetwork in the hope that this will lead to the creation of a new complete edge that the "category" arc can be s u c c e s s f u l l y applied to.
3.
This
is
the
solution
I
A name is needed for the neutral register, and topic, or theme, suggest t h e m s e l v e s immediately. But co n s i d e r the case of cleft sentences like "It was Brutus that killed Caesar" and "It was Caesar that Brutus killed" and assume, for the sake of the argument, that these are to be handled as main clauses and not by the r e l a t i v e - c l a u s e mechanism. Once again, the u n d e r l y i n g g r a m m a t i c a l function of the first noun phrase is not known when the parser first encounters it. The p r o b l e m can be solved by the same device, but of all the names one might choose for the neutral register, "topic" is least appropriate in this instance. Something like focus or comment would be more to the point. What, then, of d a t i v e s ? C o n s i d e r "He gave Fido a bone" and "He gave Fido to Mary". The problem here is the noun phrase f o l l o w i n g the verb. In neither case can we convincingly argue that it is either the topic or the focus of the sentence.
THE USE OF REGISTERS
The principal problem with this simple plan, when applied to r e v e r s i b l e grammars, is that the registers cannot be guaranteed to have the necessary contents at the time required. One of the strengths of the ATN formalism is that it allows the parser to "change its mind". The canonical example is the passive construction. The first verb phrase in the sentence is assigned to the subject register. But when a passive verb--part of the verb "be" and a transitive past participle--has been encountered, the contents of the subject register are simply transferred to the object register. If a "by" phrase follows, its object will later go into the subject register. In this way, a great deal of backing up is avoided. In g e n e r a t i n g a passive sentence, it is clear that the first step cannot be to deposit the contents of the subject r e g is t e r in the first position. An a l t e r n a t i v e might be to decide which register to use in filling the first position by e x a m i n i n g a "voice" register, using the object instead of the subject register if its value is "passive". But this would require us to assign a value to the voice r e g i s t e r in parsing before the relevant e v i d e n c e is in. It would work only if the c o n t e n t s of the voice r e g i s t e r were changed at the same time as the passive verb was r e c o g n i z e d and the contents of the subject r e g i s t e r were moved to the object register. It could indeed be made to work, but the solution is unsatisfactory because it does not reflect any general principle that carries over to other cases. More important, it v i o l a t e s a princple that must be regarded as f und a m e n t a l for the achievement of r e v e r s i b i l i t y in general, namely that each elementary operation that an arc in the grammar can specify must have two systematically related i n t e r p r e t a t i o n s , for use in g e n e r a t i o n and parsing respectively.
4.
FUNCTIONAL
SENTENCE
PERSPECTIVE
The most s a t i s f y i n g s o l u t i o n to these problems is to be found in the work of the Prague school of linguists, particularly Mathesius, Firbas, Danes, and Sgall. The basic notion is that of the Functional Sentence Perspective according to which topic and focus are two regions in the scale of communicative d y n a m i s m a l o n g w h i c h each of the m a j o r c o n s t i t u e n t s of a s e n t e n c e are ordered. In the unmarked case, each s u c c e d i n g c o n s t i t u e n t in the surface string has a higher degree of communicative dynamism. The point on the scale at which one passes from topic to focus may or may not be marked. In speech, special stress can be used to mark any element as the focus; in w r i t i n g , several devices like c l e f t i n g fill the same role. Communicative dynamism correlates with a number of other notions that are more familiar in this part of the world. glements that are low on this scale are the ones that are more c o n t e x t u a l l y bound, w h i c h is to say that they involve p r e s u p p o s i t i o n s about the p r e c e d i n g text. In "It was Brutus that killed Caesar", "that k i l l e d Caesar" is the topic and it clearly involves the presupposition that someone killed Caesar. in an u n m a r k e d sentence, like "Brutus killed Caesar", it is not clear whether the dividing line between topic and comment ~alls before or after the verb; there are n e v e r t h e l e s s three degrees of communicative d y n a m i s m involved.
Another solution would be to assign the first noun phrase to a neutral r e g i s t e r when it is e n c o u n t e r e d in parsing, and only to copy it into the subject or object r e g i s t e r s when it was finally e s t a b l i s h e d w h i c h one it belonged in. This neutral register w o u l d have to be reflected directly in the structure assigned to the sentence because it w o u l d be from there that the first noun phrase in the sentence would have to be taken by the generator. One advantage of this scheme is that a passive marker ~ ,Id no longer be required in the structure of" passive sentences. Instead, the voice of a sentence would be determined by the generator on the asis o~ which r e g i s t e r - - s u b j e c t or object--had the same contents as the special neutral register. The general principle behind this strategy is that ~he contents of a regi ster are never changed in the course ~f either generation
A c c o r d i n g to this .iew, the difference between "He gave Fido to Mary" and "He gave Mary Fido" is not in what is topic and what is focus but simply in the p o s i t i o n s that "Mary" and "Fido" occupy on the scale of communicative dynamism. Consider the sentences:
14.
(I) (2)
(3) (4)
John did all the work, the reward to Bill. John did all the work, Bill the reward. They that They that
were they were they
[Subject:John Dir-obj:(the candy) / Verb:gave Indir-obj:Mary] => "What John did with the candy was give it to Mary"
but they gave but they gave
so impressed with the work gave Bill a reward. so impressed with the work gave a reward to Bill.
The implications for reversible syntactic p r o c e s s i n g seem to be as follows: The f a m i l i a r set of registers, named for the most part for the names of grammatical functions, are s u p p l e m e n t e d by three others called topic, focus and, say, marker. Marker will have a value only when the sentence is marked in the sense I have been using. Topic and focus will contain ordered lists of elements. The structure of a passive sentence, for example, will be r e c o g n i z a b l e by the fact that it is u n m a r k e d and has a patient (dative, or whatever) as the first item on its topic list. The parser will place the first noun phrase in a "standard" sentence on this list and only copy it into some other register later. The generator will ~ unload the first item into the string and decide later what form of verb to produce.
I claim that (2) and (4) are less natural than (I) and (3) when read with even intonation. Sentence (5), with underlining for stress, is, of course, quite natural, and (6) is questionable. (5) (6)
John Bill They that
did all the work, but they gave the reward. were so impressed with the work they gave a reward to Bill.
The claim is simply that the last item carries the greatest communicative load, r e p r e s e n t s the most novel component of the sentence. This is consistent with the o b s e r v a t i o n that dative m o v e m e n t is at best awkward when the direct object is a pronoun, as in
(7)
The i l l - f o r m e d n e s s of the ideas I have tried to present here is clear for all to see. I have so far a c q u i r e d only the most tenuous grasp of what the Czech linguists are doing and, while I should p u b l i c l y thank Petr Sgall for his patience in e x p l a i n i n g it to me, it is only right that I should also appologise for the egregious errors I have doubtless been guilty of. But, whatever errors of detail there may be, one i m p o r t a n t point will, I hope, remain. The notions of topic and focus are clearly well m o t i v a t e d in t h e o r e t i c a l linguistics, and the richer notion of functional sentence p e r s p e c t i v e p r o b a b l y is also. I have been led to these same notions for purely t e c h n i c a l reasons arising out of my desire to build a r e v e r s i b l e syntactic processor.
I gave him it.
and it becomes more awkward when the i n d i r e c t object is more ponderous, as in (8)
I gave the man you said you had seen it.
In fact, it is consistent with the o b s e r v a t i o n that ponderous c o n s t i t u e n t s tend to be deferred, using such devices as extrapositlon. It is in the nature of pronouns that they are contextually bound, and the complexity of large c o n s t i t u e n t s p r e s u m a b l y comes directly from the fact that they tend to convey new information.
REFERENCES
What this suggests is a formalism in w h i c h the structure of a phrase is a list of a t t r i b u t e s named for g r a m m a t i c a l functions, whose values are words or other phrases. They are ordered so as to show there positions on the scale of communicative d y n a m i s m and there is provision for a marker to be introduced into the list e x p l i c i t l y separating the topic from the focus. Considering only the sentence level, and simplifying greatly, this would give examples like the following, using " / " a s the marker:
Firbas, J. "On Defining the Theme in Functional Sentence Analysis", Travaux Linguistiques d_ee Pragu_ee, Vol. I, pp 267-280, 1964. Kaplan, Ronald M. A General Syntactic P r o c e s s o r in Randall Rustin(ed.) "Natural Language Processing", New York, A l g o r i t h m i c s Press, 1973. Mathesius, V. "Zur Satzperspektive in modernen English", Archiv fuer das Studium der neueren Sprachen und Literaturen, Vol. 155, pp. 202-210, 1929.
[Subject:John Verb:gave Dir-obj:(the candy) Indir-obj:Mary] => "John gave the candy to Mary" [Indir-obJ:Mary Verb:gave Dir-obj:(the candy) Subject:John] => "Mary was given the candy by John" [Verb:gave Dir-obJ:(the candy) Indir-obJ:Mary / Subject:John] => "It was John that gave Mary the candy" or "John gave Mary the candy" [Subject:John Verb:gave Dir-obj:(the candy) / Indir-obj:Mary] => "It was Mary that John gave the candy to"
15.