tary sentences) is essential to acceptable natural ... mentary answers to questions, and an ex- tension for .... in the grammar we are using, a subnetwork that.
An Improved Heuristic for Ellipsis Processing*
Ralph M. Welschedel Department of Computer & Information Sciences University of Delaware Newark, Delaware 19711
I.
and
2.
Introduction
In Quirk, et al. (1972), three types of c o n t e x t u a l e l l i p s i s have been identified: I. r e p e t i t i o n , if the utterance is a f r a g m e n t of the p r e v i o u s s e n t e n c e . 2. r e p l a c e m e n t , if the input r e p l a c e s a s t r u c t u r e in the p r e v i o u s s e n t e n c e . 3. e x p a n s i o n , if the input adds a new type of structure to those used in the previous sentence. Instances of the three types a p p e a r in the f o l l o w i n g e x a m p l e .
a) b) c) d) e) f) g)
you
I
Related
Work
Several natural language systems (e.g., B o b r o w et al., 1977; Hendrix et al., 1978; Kwasny and S o n d h e i m e r , 1979) include heuristics for replacement and repetition ellipsis, but not expansion ellipsis. One g e n e r a l s t r a t e g y has been to s u b s t i t u t e f r a g m e n t s into the a n a l y s i s of the p r e v i o u s input, e.g., substituting parse trees of the e l l i p t i c a l input into the parse trees of the p r e v i o u s input in LIFER ( H e n d r i x , et al., 1978). This only a p p l i e s to i n p u t s of the same type, e.g., repeated questions.
R o b u s t r e s p o n s e to e l l i p s i s ( f r a g m e n tary s e n t e n c e s ) is e s s e n t i a l to a c c e p t a b l e natural language interfaces. For instance, an e x p e r i m e n t with the REL E n g l i s h query system showed 10% e l l i p t i c a l input ( T h o m p s o n , 1980).
Were
Norman K. Sondheimer Software Research Sperry Univac MS 2G3 Blue Bell, Pennsylvania 19424
A l l e n (1979) d e a l s with some e x a m p l e s of e x p a n s i o n e l l i p s i s , by f i t t i n g a p a r s e d elliptical input into a model of the speaker's plan. This is s i m i l a r to o t h e r m e t h o d s that i n t e r p r e t f r a g m e n t s by placing them into p r e p a r e d f i e l d s in f r a m e s or case slots ( S c h a n k et al., 1980; H a y e s and Mouradian, 1980; Waltz, 1978). This app r o a c h s e e m s most a p p l i c a b l e to limiteddomain systems.
angry?
(repetiion with c h a n g e in person) (replacement) Furious. Probably. (expansion) (expansion) For a time. (expansion) Very. I did not w a n t to be. (expansion) (expansion & Y e s t e r d a y I was. repetition) was.
3.
The H e u r i s t i c
There are three a s p e c t s to our solutien: a mechanism for repetition and replacement ellipsis, an extension for i n p u t s of d i f f e r e n t types, such as fragmentary answers to q u e s t i o n s , and an ext e n s i o n for e x p a n s i o n e l l i p s i s .
In addition to a p p e a r i n g as a n s w e r s foll o w i n g q u e s t i o n s , any of the three types can appear in q u e s t i o n s f o l l o w i n g statements, s t a t e m e n t s f o l l o w i n g s t a t e m e n t s , or in the u t t e r a n c e s of a s i n g l e s p e a k e r .
3.1 This paper presents a method of aut o m a t i c a l l y i n t e r p r e t i n g e l l i p s i s based on dialogue context. Our m e t h o d expands on p~evious work by a l l o w i n g for e x p a n s i o n e l l i p s i s and by a l l o w i n g for all combinations of statement following question, question following statement, question f o l l o w i n g q u e s t i o n , etc.
Repetition
and
Replacement
As n o t e d above, repetition and replacement ellipsis can be v i e w e d as subs t i t u t i o n in the p r e v i o u s form. We have implemented this notion in an a u g m e n t e d transition network (ATN) grammar interpreter with the a s s u m p t i o n that the "prev i o u s form" is the c o m p l e t e ATN path that parsed the previous input and that the l e x i c a l items c o n s u m e d a l o n g that path are associated with the arcs that c o n s u m e d them. In e l l i p s i s mode, the ATN interp r e t e r e x e c u t e s the path u s i n g the e l l i p t ical input in the f o l l o w i n g way:
*This material is based upon work partially supported by the National Science Foundation under Grant No. IST-8009673.
85
q u e s t i o n s f o l l o w i n g s t a t e m e n t s , etc. Our approach to the p r o b l e m is to w r i t e a set ef t r a n s f o r m a t i o n s w h i c h map the parse path of a q u e s t i o n (e.g., T a b l e I) into an expected parse path for a declarative response, and the parse ~ p a t h for a dec l a r a t i v e into a path for an expected q u e s t i o n , etc.
I.
Words from the e l l i p t i c a l input, i.e., the c u r r e n ~ input, may be consumed a l o n g the path at any point. 2. Any arc r e q u i r i n g a word not found in the current input may be traversed using the lexical item associated with the arc from the p r e v i o u s input. 3. H o w e v e r , once the path c o n s u m e s t h e first word from the elliptical input, all w o r d s from the e l l i p t i c a l input must be c o n s u m e d b e f o r e an arc can use a word from the p r e v i o u s input. 4. T r a v e r s i n g a P U S H arc may be a c c o m ~ p l i s h e d e i t h e r by f o l l o w i n g the subpath of the previous input or by f i n d i n g any c o n s t i t u e n t ef the required type in the c u r r e n t input. The e n t i r e ATN can be used in these cases.
The l e f t - h a n d side of a transformation is a p a t t e r n w h i c h is m a t c h e d a g a i n s t the ATN path of the previous utterance. Pattern elements include literals referring te arcs, v a r i a b l e s w h i c h m a t c h a single arc or e m b e d d e d path, v a r i a b l e s w h i c h m a t c h zero or m e r e arcs, and sets ef alternatives. It is s t r a i g h t f o r w a r d to cons t r u c t a d i s c r i m i n a t i o n net corresponding to all left-hand sides for e f f i c i e n t l y f i n d i n g what p a t t e r n s m a t c h the ATN path of the p r e v i o u s s e n t e n c e . The r i g h t - h a n d side ef a transformation is a pattern which constructs an e x p e c t e d path. The form of the p a t t e r n en the r i g h t - h a n d side is a list of r e f e r e n c e s to s t a t e s , arcs, and l e x i c a l e n t r i e s . Such r e f e r e n c e s can be made through items matched on the l e f t - h a n d side or by e x p l i c i t c o n s t r u c t i o n ef l i t e r a l path e l e m e n t s .
S u p p o s e that the path for "Were you angry?" is given by Table I. Square b r a c k e t s are used to indicate subpaths resulting from PUSHes. "..." i n d i c a t e s tests and a c t i o n s w h i c h are i r r e l e v a n t te the c u r r e n t d i s c u s s i o n . 01d State S
Arc (CAT C O P U L A
Sx [NP
(PUSH NP . . . (CAT PRO . . .
(TO S y ) ) (TO NPa))
(POP (CAT
(TO Sz))
NPa Sy Sz
...) ADJ ...
...
(TO Sx))
Lexical Item "w--~'r~e"
Our t e c h n i q u e is to r e s t r i c t the m a p ping such that any e x p e c t e d parse path is g e n e r a t e d by a p p l y i n g only one t r a n s f o r m a tion and a p p l y i n g it only once. A s p e c i a l f e a t u r e of our t r a n s f o r m a t i o n a l s y s t e m is the automatic allowance for dialogue diexis. An e x p e c t e d parse path for the answer to "Were you a n g r y ? " is g i v e n in T a b l e 2. N o t e in T a b l e 2, "you" has become "I" and "were" has b e c o m e "was"
"you" ] "angry"
(POP . . . )
An ATN
Path
Table I for "Were
you A n g r y ? "
Old State S [NP NPa Sa Sy Sz
An e l l i p t i c a l input of "Was he?" foll o w i n g "Were you a n g r y ? " could be understeed by t r a v e r s i n g all of the a r c s as in T a b l e I. F o l l o w i n g point I above, "was" and "he" w o u l d be s u b s t i t u t e d for "were" and "you". F o l l o w i n g point 3, in traversing the arc (CAT ADJ ... (TO Sz)) the lexical item " a n g r y " from the p r e v i o u s input would be used. Item 4 is i l l u s t r a t e d by an e l l i p t i c a l input of "Was the old man?"; this is u n d e r s t o o d by t r a v e r s i n g the arcs at the S level of T a b l e I, but using the appropriate path in the NP n e t w o r k to parse the old man
3.2
Transformations
of the P r e v i o u s
Arc (PUSH NP ... (CAT PRO ... (PoP ...)
(TO Sa)) (TO NPa))
...
"I" ]
(CAT COPULA . . . (CAT ADJ
Lexical Item
(TO S y ) )
(TO Sz))
"was " "angry"
(POP . . . )
Table 2 D e c l a r a t i v e for the e x p e c t e d for "Were you a n g r y ? " .
answer
U s i n g this path, the e l l i p s i s interpreter de'scribed in S e c t i o n 3.1 w o u l d u n d e r s t a n d the e l l i p s e s in "a)" and "b)" below, in the same way as "a')" and "b'i"
Form
a) I was.
While the approach illustrated in S e c t i o n 3.1 is u s e f u l in a data base q u e r y environment where ~]liptical input typically is a m o d l f i c a t i o n of the p r e v i o u s query, it does not a c c o u n t for elliptical statements following questions, elliptical
a')
I was
angry.
b) ~y spouse was. b')
86
My
spouse
was
angry.
3.3
Only two c l a s s e s of s t r u c t u r a l cons t r a i n t s are c a p t u r e d . One relates the ellipsis to the p r e v i o u s f o r m as a c o m b i nation of repetition, replacement, and expansion. The o ~ h e r c o n s t r a i n t is that the input must be c o n s u m e d as a c o n t i g u o u s string. This c o n s t r a i n t is v i o l a t e d , for instance, in "I was (angry) y e s t e r d a y " as a response to "Were you angry?" Nevertheless, the c o n s t r a i n t is computationally useful, since allowing arbitrary gaps in consuming the elliptical input produces a very large s p a c e of c o r r e c t interpretations. A ludicrous example is the following question and elliptical response:
Expansions
A l a r g e class of e x p a n s i o n s are simple adjuncts, such as e x a m p l e s c, d, e, and g in s e c t i o n I. We have h a n d l e d this by building our e l l i p s i s i n t e r p r e t e r to allow departing from the base path at designated states to c o n s u m e an a d j u n c t from the i n p u t string. We m a r k s t a t e s in the g r a m m a r w h e r e a d j u n c t s can occur. For each such state, we list a set of linear (though possibly cyclic) paths, called "expansion paths". Our interpreter as implemented allows departures from the base path at any s t a t e so marked in the grammar; it follows e x p a n s i o n paths by c o n s u m i n g w o r d s from the input string, and must return to a state on the base form. E a c h of the e x a m p l e s in c, d, e, and g of section I can be handled by e x p a n s i o n pat h s only one arc long. They are given in T a b l e 3. Initial State
Expansion ADVERB
(PUSH
(PUSH
PF
...
Path ... (TO S)) P r o b a b l y (I was
Has the boss raise? A fat raise.
a time
(I was
Sy
(PUSH
angry).
this i n c l u d e s a teat that the NP is one of time or place) • .. (TO S)) Y e s t e r d a y (I was a n g r y ) . INTENSIFIER-ADVERB ...
(I was)
Example
Table 3 Expansion
(TO
very
Sy))
(angry).
Paths
S i n c e this is an e x t e n s i o n to the e l l i p s i s interpreter, combinations of r e p e t i t i o n , r e p l a c e m e n t , and e x p a n s i o n can all be handled by the one m e c h a n i s m . For i n s t a n c e , in r e s p o n s e to "Were you a n g r y ? " , " Y e s t e r day you were (angry)" w o u l d be t r e a t e d using the expansion and replacement mechanisms.
~.
Special
Cases
and
mutual
friend
a
W h i l e it may be p o s s i b l e to v i e w all c o n t e x t u a l e l l i p s i s as c o m b i n a t i o n s of the operations repetition, replacement, and expansion a p p l i e d to s o m e t h i n g , our m o d e l m a k e s the strong assumption that these o p e r a t i o n s may be v i e w e d as a p p l y i n g to an ATN path r a t h e r s t r a i g h t f o r w a r d l y related to the p r e v i o u s u t t e r a n c e . Not all e x p a n s i o n s can be v i e w e d that way, as e x a m p l e f in Section I illustrates. Also, a n s w e r s of "No" r e q u i r e s p e c i a l p r o c e s s i n g ; that response in answer to "Were you a n g r y " s h o u l d not be i n t e r p r e t e d as "No, I was angry." One s h o u l d be able to a c c o u n t for such examples within the heuristic d e s c r i b e d in this paper, p e r h a p s by a l l o w ing the t r a n s f o r m a t i o n s y s t e m d e s c r i b e d in s e c t i o n 3.2 to be c o m p l e t e l y g e n e r a l rather than s t r o n g l y restricted to one and only one t r a n s f o r m a t i o n a p p l i c a t i o n . Rowever, we p r o p o s e h a n d l i n g such cases by special purpose rules we are d e v e l o p i n g . T h e s e r u l e s for the special cases, plus the mechanism d e s c r i b e d in s e c t i o n 3 together will be formally equivalent in p r e d i c t i v e p o w e r to a g r a m m a r for e l l i p t i cal forms.
angry).
(PUS~ ~P (*
our
A l l o w i n g a r b i t r a r y gaps b e t w e e n the substrings of the ellipsis allows an int e r p r e t a t i o n such as "A (boss has given our) fat ( f r i e n d a) r a i s e . "
(To s))
For
given
Though the h e u r i s t i c is i n d e p e n d e n t of the individual grammar, designating expansion paths and t r a n s f o r m a t i o n s obvio u s l y is not. The g r a m m a r may make this an easy oz" d i f f i c u l t task. For i n s t a n c e in the g r a m m a r we are using, a subnetwork that collects all tense, a s p e c t , and mod a l i t y e l e m e n t s w o u l d s i m p l i f y some of the transformations and e x p a n s i o n paths.
Limitations
The ideal model of c o n t e x t u a l ellipsis w o u l d c o r r e c t l y predict what are appropriate elliptical forms in c o n t e x t , what t h e i r interpretation is, and what forms are not m e a n i n g f u l in c o n t e x t . We b e l i e v e this r e q u i r e s s t r u c t u r a l restrictions, semantic c o n s t r a i n t s , and a m o d e l of the g o a l s of the s p e a k e r . Our heuristic does not meet these c r i t e r i a in a n u m b e r of cases.
~aturally, semantics must play an important part in ellipsis processing. C o n s i d e r the u t t e r a n c e pair below:
87
Did the bess Some wine.
have
a martini
References
at lunch?
Allen, James F., "A P l a n - B a s e d A p p r o a c h to Speech Act Recognition," Ph.D. Thesis, Dept. o f ' C o m p u t e r Science, University of Toronto, Toronto, Canada, 1979.
Though s y n t a c t i c a l l y this could be interpreted either as "Some wine (did have a martini at lunch)", "(The boss did have) some wine (at lunch)", or "(The boss did have a martini at) some wine". Semantics should prefer the second reading. We are testing our heuristic using the RUS grammar (Bebrow, 1978) which has frequent calls from the grammar requesting that the semantic component decide whether to build a semantic i n t e r p r e t a t i o n for the partial parse found or to veto that partial parse. This should aid performance.
~.
Summary
Bobrew, D., R. Kaplan, M. Kay, D. Norman, H. Thompson and T. Winograd, "GUS, A Frame-driven Dialog System", Artificial Intelligence, 8, (1977), 155-173. Bobrow, R., "The RUS System", in Research in Natural Language U n d e r s t a n d i n $ , by B. W e b b e r and R. Bobrow, BBN Report No. 3878, Belt Beranek and Newman, Inc., Cambridge, MA, 1978.
and C o n c l u s i o n
Hayes, P. and G. Mouradian, "Flexible Parsing", in Proc. of the 18th Annual Meetin~ of the Assoc. for Cemp. Ling., P h i l a d e l p h i a , June, 1980, 97-103.
There are three aspects te our solution: a m e c h a n i s m for repetition and replacement ellipsis, an extension for inputs of different types, such as fragmentary answers to questions, and an extension for e x p a n s i o n ellipsis.
Hendrix, G., E. Sacerdoti, D. S a g a l o w i c z and J. Slocum, "Developing a Natural Language Interface to Complex Data", ACM Trans. on Database S~s., 3, 2, (1978--~, 105-147.
Our heuristic deals with the three types of e x p a n s i o n ellipsis as follows: Repetition ellipsis is processed by repeating specific parts of a transformed previous path using the same phrases as in the transformed form ("I was angry"). Replacement ellipsis is processed by substituting the elliptical input for contiguous c o n s t i t u e n t s on a transformed previous path. E x p a n s i o n ellipsis may be processed by taking specially marked paths that detour from a given state in that path. C o m b i n a t i o n s of the three types of ellipsis are r e p r e s e n t e d by combinations of the three v a r i a t i o n s in a transformed previous path.
Kwasny, S. and N. Sondheimer, " U n g r a m m a t i cality and E x t r a g r a m m a t i c a l i t y in Natural Language U n d e r s t a n d i n g Systems", in Proc. ef the 17th Annual Meeting of the Assoc. for Comp. Lin~., San Diego, August, 1979, 19-23. Quirk, R., S. Greenbaum, G. Leech Svartvik, A G r a m m a r of C e n t e m p o r y Seminar Press, New York, 1972.
Schank, R., M. Lebowitz and L. Birnbaum, "An Integrated Understander", American Journal of Comp. Ling., 6, I, (1980), 13-30.
There are two c o n t r i b u t i o n s of the work. First, our method allows for expansion ellipsis. Second, it accounts for c o m b i n a t i o n s of previous sentence form and ellided form, e.g., statement following question, question following statement, question following question. Furthermore, the method works without any constraints on the ATN grammar. The h e u r i s t i c s carry over to formalisms similar to the ATN, such as c o n t e x t - f r e e grammars and augmented phrase structure grammars.
Thompson, B. H., "Linguistic Analysis of' Natural Language C o m m u n i c a t i o n with Computers", p~'oceedings of the Eighth I n t e r n a t i o n a l C o n f e r e n c e on Computationai Linguistics, Tokyo, October, 1980, 190-201. Waltz, D., "An English A n s w e r i n g System for a Database", Csmm. ACM, 526-559.
Our study of ellipsis is part of a much broader framework we are developing for processing syntactically and/or semantically ill-formed input; see W e i s c h e d e l and S o n d h e i m e r (1981).
~luch credit is due to Amir Razi programming assistance.
Language Q u e s t i o n Large Relational 21, 7, (1978),
Weischedel, Ralph M. and Norman K. Sondheimer, "A F r a m e w o r k for P r o c e s s i n g IllFormed Input", T e c h n i c a l Report, Dept. of Computer & Informatiou Sciences, U n i v e r s i ty of Delaware, Ne~ark, DE, 1981.
Acknowledgement
his
and J. English,
for
88