This paper describes a representation for time relations in narrative. The time relations are based on both explicit sources of time information. (e.g., adverbial ...
REPRESENTING IMPLICIT AND EXPLICIT TIME RELATIONS IN NARRATIVE*
L y n e t t e Hirschman and Guy S t o r y Linguistic String Project New York U n i v e r s i t y New York, New York 10012 ABSTRACT
a n a r r a t i v e can determine t h e sequence of events even i n t h e absence o f e x p l i c i t t i m e i n f o r m a t i o n f o r each e v e n t . To do t h i s , t h e reader makes use of c e r t a i n characteristics of narrative discourse, i n o r d e r t o a s s i g n t i m e r e l a t i o n s t o events whose t i m e i s not e x p l i c i t l y g i v e n . One c h a r a c t e r i s t i c o f connected d i s c o u r s e i s r e f e r e n c e : by i d e n t i f y i n g v a r i o u s mentions o f a n event a s r e f e r r i n g t o t h e same e v e n t , t h e reader a s s o c i a t e s these ment i o n s w i t h t h e same t i m e . The conventions o f n a r r a t i v e discourse also provide c e r t a i n d e f a u l t time r e l a t i o n s between consecutive clauses and/or s e n tences. For example, t h e convention o f " n a r r a t i v e time progression" provides t h a t in n a r r a t i v e , time does not move backward unless an e x p l i c i t t i m e marker i s p r o v i d e d . An algorithm f o r capturing t i m e r e l a t i o n s in n a r r a t i v e must make use both of t h e e x p l i c i t t i m e i n f o r m a t i o n and o f t h e t i m e i n f o r m a t i o n conveyed i m p l i c i t l y by r e f e r e n c e and by t h e d e f a u l t t i m e r e l a t i o n s . T h i s paper w i l l p r e sent such a n a l g o r i t h m and o u t l i n e i t s a p p l i c a t i o n t o a sample t e x t .
T h i s paper d e s c r i b e s a r e p r e s e n t a t i o n f o r t i m e relations in narrative. The t i m e r e l a t i o n s are based on both e x p l i c i t sources of time i n f o r m a t i o n ( e . g . , a d v e r b i a l expressions o r tense) and i n p l i c i t sources, such as m u l t i p l e reference to a s i n g l e event, n a r r a t i v e t i m e p r o g r e s s i o n and e a r l i e r events i m p l i e d by change of s t a t e words. Natural language processing i s used t o analyse t h e i n p u t text i n t o a set of subject-verb-object u n i t s , connected by b i n a r y connectives; these u n i t s c o r r e s pond t o t h e events o f t h e n a r r a t i v e . With each event i s a s s o c i a t e d a t i m e i n r e l a t i o n t o another event, a d j u s t e d by an o p t i o n a l time q u a n t i t y . These t i m e r e l a t i o n s have a n a t u r a l r e p r e s e n t a t i o n as a d i r e c t e d graph whose nodes are time p o i n t s and whose edges are t i m e i n t e r v a l s . The a l g o r i t h m f o r e x t r a c t i n g t h e t i m e r e l a t i o n s from a t e x t i s i l l u s t r a t e d f o r an excerpt from a h o s p i t a l discharge summary. I
Introduction
II
A n understanding o f t i m e r e l a t i o n s i s essent i a l t o understanding n a r r a t i v e . For a computer program t o "understand" n a r r a t i v e , i t must process these t i m e r e l a t i o n s ; t h i s r e q u i r e s both a r e p r e s e n t a t i o n o f t h e types o f t i m e i n f o r m a t i o n t h a t i n terconnect n a r r a t e d e v e n t s , and a n a l g o r i t h m t o e x t r a c t t h i s i n f o r m a t i o n from t h e n a t u r a l language text.
Background
There have been s e v e r a l p r e v i o u s i n v e s t i g a t i o n s o f the representation o f time r e l a t i o n s , a l l w i t h i n t h e c o n t e x t o f question-answering systems. [1] focused o n t h e s p e c i f i c a t i o n o f t h e temporal i n f o r m a t i o n p r o v i d e d by tense and described a p r o gram CHRONOS t h a t accepted a r e s t r i c t e d s e t of Eng l i s h - l i k e input, stored t h i s information and answered simple t i m e - r e l a t e d q u e s t i o n s based on t h e input. Since t h e i n p u t was provided on a sentence-by-sentence b a s i s , t h e issue o f i m p l i c i t time r e l a t i o n s i n n a r r a t i v e d i s c o u r s e d i d not a r i s e i n t h i s work. [2] d e s c r i b e d a r e p r e s e n t a t i o n o f time r e l a t i o n s between s e t s of events • The r e p r e s e n t a t i o n was used in q u e r y i n g a d a t a base c o n s i s t ing of events whose times were s p e c i f i e d (sometimes incompletely) i n terms o f t h e i r d u r a t i o n and i n terms o f beginning and ending times i n r e l a t i o n t o another e v e n t . From t h i s i n f o r m a t i o n , t h e program could determine whether an event preceded, o v e r lapped, o r f o l l o w e d another e v e n t .
The r e p r e s e n t a t i o n o f time r e l a t i o n s i n n a r r a t i v e r a i s e s a number of complex problems. On t h e one hand, t h e r e are a m u l t i t i u d e of t i m e words: prepositions ( e . g . , during, before), adjectives ( c u r r e n t , p r i o r ) , adverbs ( t h e n , now), and nouns ( d u r a t i o n , week). These v a r i o u s words combine i n t o time expressions ( g e n e r a l l y a d v e r b i a l expressions) which m u s t be analysed and reduced to a systematic representation. On t h e o t h e r hand, most events in a n a r r a t i v e do not have an e x p l i c i t t i m e s t a t e d in t h e form of an a d v e r b i a l e x p r e s s i o n . Nonetheless t h e reader of
More r e c e n t l y , [5] and [6] d e s c r i b e d a " t i m e s p e c i a l i s t " program which accepted t i m e i n f o r m a t i o n about events i n a L I S P - l i k e f o r m ; the specialist s t o r e d t h i s i n f o r m a t i o n , checked i t f o r c o n s i s t e n c y , and answered q u e r i e s about t h e events d e s c r i b e d i n t h e i n p u t . Although t h e t i m e s p e c i a l i s t handled only e x p l i c i t time information provided in the i n p u t , t h e r e i s a d i s c u s s i o n i n [ 5 ] about i t s a p -
T h i s research was supported in p a r t by t h e Nat i o n a l L i b r a r y of Medicine g r a n t number LM-02616, awarded b y t h e N a t i o n a l I n s t i t u t e s o f H e a l t h ; in p a r t by t h e N a t i o n a l Science Foundation grant number IST-7920788; and in p a r t by t h e N a t i o n a l Science Foundation g r a n t MCS-8002453.
2 89
p l i c a t i o n to a n a r r a t i v e - l i k e s i t u a t i o n , namely the specification of the time course of a disease and the matching of input data to the specified time course. In contrast to the question-answering context of the work described above, t h i s paper is based on research at the Linguistic String Project on the creation of a data base from natural language input. In our i n i t i a l experiments using medical narrative, it quickly became clear that time r e l a tions between various medical events ( e . g . , symptoms, therapies) were of central importance in capturing the significant information. The time a l gorithm described here is a modification of an earl i e r algorithm [3] developed f o r the analysis of medical records, and the examples in the paper are drawn from actual hospital discharge summaries. The algorithm should, however, be equally applicable to time relations in other narrative domains. In the automatic analysis of the medical narr a t i v e material, various stages of natural language processing preceded the analysis of time information. These stages included 1) a parse, i d e n t i f y ing (surface) syntactic r e l a t i o n s ; 2) transformat i o n a l regularization, mapping more complex sentence types i n t o a "canonical set" of simple subject-verb-object sentence types; 3) a "formatting" component, mapping the analysed sentence i n t o a t a bular representation of the medical information. (See [7] f o r a more detailed description of the stages of natural language processing). These e a r l i e r stages of processing are r e l e vant to the time algorithm because the time algorithm assumes that each sentence is broken down i n t o syntactic units consisting of subject-verb-object combinations corresponding to events; these units are connected by binary connectives, with the main clause preceding the clause subordinated or conjoined to i t . Binary connectives include, among o t h e r s , subordinate conjunction ( e . g . , while in She developed dyspnea while she was in Maryland); co-ordinate conjunction ( e . g . , and in Her elbow became infected and she was t r e a t e d . . . . ) ; and rel a t i v e clauses ( e . g . , She was treated with erythromycin which she took f o r 5 days.). As part of the syntactic analysis, each tense and each exp l i c i t time expression is i d e n t i f i e d and associated with the unit that it modifies. The output of the syntactic processing is thus a sequence of events with associated time information, connected by b i nary connectives. The time algorithm operates sequentially on the set of connected events obtained for a document; it s t a r t s at the beginning of the narrative and assigns to each event a time in terms of an already processed event, before going on to the next event. The emphasis in developing t h i s algorithm has been to use the available syntactic information to i t s f u l l e s t . We have not explored the question of whether the incorporation of domain-specific e x t r a - l i n g u i s t i c knowledge would improve the processing of time ( i t almost c e r t a i n l y would), and whether the kinds of e x t r a - l i n g u i s t i c knowledge needed to do t h i s can be c l e a r l y delimited (a d i f f i c u l t t a s k ) . I t i s our hope that the f u l l u t i l i -
290
zation of l i n g u i s t i c techniques w i l l bring into sharper focus the kinds of information needed to supplement a l i n g u i s t i c a l l y based analysis. Ill
The Representation of Time
It is convenient to analyse the time of an event in terms of i t s r e l a t i o n to the time of another event (the "reference point") adjusted by an optional time quantity (see [6) and 131). The reference point may be a date, which anchors the time to a fixed external scale, or it may be another event within the narrative. This analysis has a natural representation as a directed graph, where the nodes of the graph are the time points associated with the events and the edges represent the intervals between events. The time graph f o r a short three sentence t e x t l e t is shown in F i g . 1. An event generally has a single time point associated with i t , although events with duration have both a beginning and an ending p o i n t . If two events occur at the same time, they w i l l both be associated with the same time point. Thus in the example of F i g . 1, if admission to the hospital occurs on 1/27 and the patient develops a fever on 1/27, then these two events w i l l both be associated with the same time point (point A in the graph of Fig. 1). The time graph is useful in determining the relative order of the events displayed in the graph: an event I precedes an event J i f f there is a path from the time point associated with event I to the time point associated with event J. The sum of the edges along the path gives the time i n t e r v a l between I and J. For example, in the graph of Fig. 1, B precedes A. There are, in f a c t , two paths from B to A. In t h i s case, the sum of the edges of the two paths must be equal: BA = BC + CA. ( I f the length of the two paths were not equal, an i n consistency would be detected). Since BA has a
■ TV rsmmmmamaammimBSsssssssssssssssaaasBSSSSSS ,\ sr Date of a d m i s s i o n : 1/27/xx TEXT: 3 weeks p r i o r to the c u r r e n t a d u l a t i o n , a b r i g h t red patch appeared under the p a t i e n t ' s e y e . The p a t i e n t developed a naculopapular rash over hands and Knees. 0. 1/27 she began having a f e v e r to 104.
291
progression. Sentence 3 has an e x p l i c i t time (1/27), but narrative time progression also applies; t h i s provides the information that the time of sentence 3 is l a t e r than or equal to the time of sentence 2. As a r e s u l t , a lower and an upper bound on the time of sentence 2 are obtained from the discourse context, although sentence 2 has no e x p l i c i t time. These relations are represented by the graph in Pig. 1. A second hypothesis states that the time of a subordinate clause is equal to the time of i t s main clause, provided that no e x p l i c i t time information occurs in the subordinate clause or in the subordinate conjunction. For example, in the sentence A S i n a i tap was performed which revealed 806 red ood c e l l s , the subordinate clause (which revealed red blood cells) is assigned a time equal to that of the main clause (A spinal tap was performed). Co-ordinate conjunction poses some special and d i f f i c u l t problems for interpreting time informat i o n . The algorithm distinguishes two main cases. When the subject or object contains a conjoined noun or adjective phrase, the conjoined phrase is expanded during the transformational stage to y i e l d two events; however, the algorithm assigns them the same time ( s p e c i f i c a l l y the second conjunct is assigned the time of the f i r s t oonjunct). Thus in the sentence The maculopapular rash and the patch under her r i g h t eye gradually diaapppeared, the two events rash disappeared ana patch disappeared are assigned the same time When there are two d i s t i n c t predicates connected by a co-ordinate conjunction, then the two conjoined events behave l i k e two independent sentences, and the algorithm assigns to the second a time l a t e r than or equal to the time of the f i r s t , as in narrative time progression. For example, in the sentence. The maculopapular rash and the patch under her r i g h t eye gradually disappeare and were. gone after the f i r s t twelve hours on therapy, the default timeTnfonnation is used (together with the time information from the e x p l i c i t adverbial after the f i r s t twelve hours on therapy) to determine that the time of the seoon? conjunct (rash and patch were gone) is l a t e r than the time of the first conjunct (rash and patch gradually disappeared); in f a c t , the second conjunct marks the end point of the time period of the f i r s t conjunct, VI
Time Information from Resolution of Reference
discharge, or hospitalisation can be associated with the current hospitalization, then i t s time can be set equal to the date given in the header. For example, if the hospital admission occurred on 1/27, then the 4th dav;of hospitalization w i l l be 1/31, provided that this is a reference to the current hospitalization. Despite the fact that anaphoric reference is more d i f f i c u l t to detect in the note-taking s t y l e of the discharge summaries ( a r t i cles, both d e f i n i t e and i n d e f i n i t e are commonly omi t t e d ) , these references to the current h o s p i t a l i zation are generally easy to resolve because they are the default reference; there are occasional references to previous hospitalizations, but these are flagged by various modifiers, as in the previous hospitalization. The o r i g i n a l program implemented to processtime relations (3) did anaphora resolution only when an event appeared as the reference point in a time expression. However, the current algorithm extracts more complete time information by comparing each new mention of an event (regardless of whether it occurs as a reference point) to previously mentioned events of the same type. When it i d e n t i f i e s two mentions as referring to the same event, then it equates t h e i r times. A simple technique is used for determining sameness of reference: a backward search is performed for events of the appropriate type, namely, events in the same synonym class. When a candidate event is found and i t s f a c t u a l i t y is checked (negated events cannot be antecedents), then i t s time is compared to the time of the current event. If the times are compatible (that i s , no contradiction arises from assuming that these events are the "same"), then they are taken as referring to the same event; otherwise, the backwards search is continued u n t i l a l l events of the appropriate type have been tested. This procedure has been adequate to i d e n t i f y sameness of reference in the hospital discharge summaries, which have a r e l a t i v e l y simple referent i a l structure. For more complex t e x t s , it is clear that a more sophisticated technique for resol u t i o n of anaphora would have to be used. VII Capturing Time Relations in a Sample Text The two sentences below are the t h i r d and fourth sentences (events 9-15) from a paragraph of a hospital discharge summary. 1.
If a single event is mentioned several times within a discourse, it is important to i d e n t i f y these mentions of the event and to equate the times associated with each mention (ignoring, f o r a moment, descriptions of states which can have a duration). For example, the events current hospitalization and current hospital stay have a special signifcance in a hospital discharge summar y . They are associated with specific dates (provided in the header information) and appear as reference points in many time expressions, f o r example, she again spiked fevers on the 4th day of hospitalization, or patient was " a f e b r i l e on discharge . If a given mention of admission.
2.
The patient became a f e b r i l e during the f i r s t day of hospitalization. She again spiked fevers on the 3rd and 4th day of hospitalization, but then became a f e b r i l e and remained a f e b r i l e for the remainder of her hospitalization.
We w i l l use these sentences to i l l u s t r a t e how the algorithm extracts and represents time information from a narrative. In the f i r s t sentence she became a f e b r i l e , there is an i m p l i c i t reference (carried by the change-of-state verb become) to a state when the patient was not a f e b r i l e . This e a r l i e r implied state is entered as event E9; the state of becom-
292
293
narrated t o p i c , namely El3; and it provides an ending (TO) time = date of discharge. The time derived from conjoined predicate time progression (time (E14) >= time (El3) is therefore redundant and is omitted. E15: This event is derived from El4 by expansion of a conjoined adjective in the object: afebrile and asymptomatic; it is therefore taken as having the same time as E14. Fig.
The time graph f o r these sentences is shown in 4.
In the course of computing the time r e l a t i o n s , a l l events of a given type ( f o r example, fever events) are collected i n t o a s e t , in order to 5P c i l i t a t e determination of sameness of reference. This grouping has an additional advantage in that
294
it makes it possible to graph a set of fever events against time. Figure 5 shows a graph of fever against time for the sentences analysed in Figs. 3 and 4. Although it is not possible to determine the exact shape of the curve (or i t s height, since a numerical temperature is*not always given), the graph nonetheless provides an approximate picture of the p a t i e n t ' s periods of fever during the hospit a l stay.
VIII
Conclusion
In t h i s b r i e f discussion, only the broad outl i n e of an algorithm to extract time information from narrative can be sketched. The algorithm described in t h i s paper is a modification of a p r e v i ously developed algorithm [3] that was implemented
and s u c c e s s f u l l y t e s t e d on a s e t of e i g h t h o s p i t a l d i s c h a r g e summaries, each c o n t a i n i n g approximately 5 0 sentences o f t e x t [ 4 ] . The o r i g i n a l program contained a n e x t e n s i v e t r e a t m e n t o f e x p l i c i t t i m e e x p r e s s i o n s , c o v e r i n g same 50 t i m e words. The modi f i c a t i o n s t o t h e o r i g i n a l program have been m a i n l y i n the handling o f i m p l i c i t time r e l a t i o n s .
References Bruce, B . C . , A Model f o r Temporal References and i t s A p p l i c a t i o n in a Question Answering Program, A r t i f i c i a l Intelligence 3, 1-25 (1972). 2.
We a r e now in t h e process of implementing t h e new v e r s i o n o f t h e a l g o r i t h m and are s t i l l i n v e s t i g a t i n g s e v e r a l problems i n t h i s r e g a r d . These a r e : 1) m a i n t a i n i n g a compact r e p r e s e n t a t i o n t h a t p e r m i t s d e t e c t i o n of i n c o n s i s t e n c i e s when a new t i m e r e l a t i o n i s added t o t h e s e t o f known t i m e r e l a t i o n s ; 2 ) r e p r e s e n t i n g continuous s t a t e s a s o p posed to t i m e - p o i n t e v e n t s : a s t a t e may change over t i m e , but can s t i l l b e i d e n t i f i e d a s a s i n g l e state ( e . g . , a f e v e r may r i s e and f a l l but i t i s s t i l l referred to as a single continuing fever); 3) implemening a more s o p h i s t i c a t e d procedure f o r anaphora r e s o l u t i o n , i n c l u d i n g t h e h a n d l i n g o f r e f e r e n c e s t o continuous s t a t e s ; and 4 ) d i s t i n g u i s h i n g between t h e two uses o f d u r a t i o n a l e x p r e s sions: 1 ) t o d e f i n e t h e lower and upper l i m i t s o f a s i n g l e t i m e - p o i n t , as in she had a t r a n s f u s i o n d u r i n g her h o s p i t a l i z a t i o n ; or 2) to define beginm n g and end p o i n t s of a c o n t i n u o u s s t a t e : she was a f e b r i l e d u r i n g her h o s p i t a l i z a t i o n . In adHItion, t h e r e a r e a number of i s s u e s t h a t we have n o t y e t touched o n , f o r example, t h e r e p r e s e n t a t i o n o f i m p r e c i s i o n i n t i m e s p e c i f i c a t i o n s o r t h e use o f extra-linguistic knowledge t o determine t i m e r e l a t i o n s . Nonetheless, t h e g e n e r a l p r i n c i p l e s o u t lined i n t h i s paper appear t o c o n s t i t u t e a n import a n t f i r s t s t e p i n e x t r a c t i n g t i m e i n f o r m a t i o n from n a r r a t i v e in a way t h a t c a p t u r e s many of t h e import a n t r e l a t i o n s between t h e n a r r a t e d e v e n t s .
F i n d l e r , N.V. and D. Chen, R e t r i e v a l of Temp o r a l R e l a t i o n s , C a u s a l i t y , and C o e x i s t e n c e , Intl. J—. Comp. Inf. Sci. 2, 161-185
71973).
295
~~^
-
3.
Hirschman, L . , R e t r i e v i n g Time I n f o r m a t i o n from Natural Language Texts, to appear in I n f o r m a t i o n R e t r i e v a l Research, R.N. Oddy, C.J. Van R i j s b e r g e n , S.E. Robertson and P. W i l l i a m s , e d s . , B u t t e r w o r t h s , London ( 1 9 8 1 ) .
4.
Hirschman, L . , S t o r y , G . , Marsh, E . , Lyman, M., and N. Sager, An Experiment in Automated Hea l t h Care E v a l u a t i o n from N a r r a t i v e Medical Rec o r d s , to appear in Computers and Biomedical Research.
5.
Kahn, K . , Mechanization of Temporal Knowledge, M.I.T. P r o j e c t MAC T e c h n i c a l Report No. 155 (1975).
6.
Kahn, K. and G.A. G o r r y , Mechanizing Temporal Knowledge, A r t i f i c i a l I n t e l l i g e n c e 9 , 87-108 (1977).
7.
Sager, N . , N a t u r a l Language I n f o r m a t i o n Formatting: t h e Automatic Conversion o f T e x t s t o a S t r u c t u r e d Data Base, Advances in Computers 17, 89-162, M.C. Y o v i t s and M. R u b i n o f f , e d s . , Academic P r e s s , New York ( 1 9 7 8 ) .