Workshop SDRT, TALN-04, Fès, 22 avril 2004
Extracting and Using Discourse Structure to Resolve Anaphoric Dependencies: Combining Logico-Semantic and Statistical Approaches Nicholas Asher (1), Pascal Denis (2), Jonas Kuhn (2), Erik Larson (1), Eric McCready (2), Alexis Palmer (2), Brian Reese (2), Linton Wang (1) (1) Philosophy Department - University of Texas at Austin
[email protected],
[email protected] (2) Linguistics Department - University of Texas at Austin {denis|jonask|mccready|alexispalmer|bjreese}@mail.utexas.edu
Mots-clefs – Keywords SDRT, XLE, méthodes statistiques, anaphore pronominale et associative SDRT, XLE, statistical methods, pronominal anaphora and bridging
Résumé - Abstract Cet article décrit un projet ayant pour objectif d’exploiter en tandem les apports théoriques de la SDRT et les techniques statistiques les plus récentes. Ce projet s’appuie sur le parser XLE (Palo Alto Research Center; PARC) en vue d’implémenter un système de résolution d’anaphores fondé sur le discours. This paper describes a project that proposes to exploit the theoretical insights from SDRT and state-ofthe-art statistical techniques. This project builds on the XLE parser (Palo Alto Research Center; PARC) to implement a discourse-based anaphora resolution system.
1 Overview Segmented Discourse Representation Theory (SDRT; (Asher, 1993), (Asher & Lascarides, 2003)) is an approach to discourse interpretation with several advantages for NLP research and applications. One of SDRT’s advantages is that it separates out a computationally tractable notion of information packaging from information content. Further, the theory of information packaging exploits the notion of an underspecified representation at several levels—at the level of compositional and lexical semantics as well as at the level of discourse logical form. Recent work by (Schlangen & Lascarides, 2002) as well as (Asher & Lascarides, 2003) show how HPSG grammars like the English Resource Grammar of (Copestake & Flickinger, 2000)
We would like to thank our two reviewers for their comments.
Nicholas Asher et al.
can provide suitable inputs to fragments of SDRT implementation. But other implementations, with perhaps shallower treatments of logical form are also possible —in particular, one can use LFG’s f-structures from the wide coverage XLE parser (see Section 3). SDRT’s exploitation of underspecification at the level of discourse logical form enables us to implement the theory at different levels of processing. A shallower level of information processing will yield a more underspecified logical form for discourse; a deeper level will yield a logical form that is more complete. Further, even if we cannot compute a discourse relation for a particular discourse unit in SDRT, we can still proceed with the anaphora resolution tasks; SDRT’s modularity also allows us to exploit statistical techniques along with declarative axioms to compute discourse structure. We set out here a plan to build a robust and largely domain independent NLP tool by adopting the following goals: (i) to provide an underspecified discourse logical form (DLF) that includes partial information about rhetorical links in the text using both the XLE parser and an adaptation of RUDI (Schlangen & Lascarides, 2002); (ii) to resolve definite descriptions and other anaphoric expressions in an underspecified logical form for individual sentences; (iii) to resolve temporal relations between clauses. A more long term goal is to identify events across various stories. We believe that computing these types of links will improve summarization, question answering, and various information extraction tasks.
2 General Methodology and System Architecture Two major problems beset previous implementation efforts for theories of discourse semantics: (i) the NLP components providing the input (in particular syntactic analysis of sufficient depth) were brittle; (ii) the discourse-level depth of analysis aimed at was extremely deep (motivated by the goal to explore several non-trivial aspects of the semantic/pragmatic theory). Both factors led to highly domain-specific systems, which had little impact on open-domain, broad-coverage NLP. Recent advances in NLP and computational syntax have led to robust deep syntactic parsers. By adding SDRT’s modularity and use of underspecification, we believe we can minimize the problems mentioned, modulo some additional assumptions. First, one has to give up a strict pipeline architecture of deep NLP components (which provide high-quality results if successful, but which lack robustness) in favor of a multi-strand architecture of components of deep, shallow and intermediate components combined opportunistically. Second, one has to adopt a goal-oriented strategy, i.e., concentrating the system’s effort on building and disambiguating structures that are crucially required for the system task (in our case, mainly resolution of various types of anaphora). Third, one combines domain knowledge-based inferencing techniques (indispensible for achieving high-quality results for a core domain application) with shallower fallback strategies. Finally, one exploits large text corpora for training domain-independent patterns underlying relevant discourse phenomena. Our design for a system for incremental sentence-by-sentence discourse processing accords with these methodological assumptions (see Figure 1). The overall system input is a document (or a set of documents); the intermediate representation and output is a partially underspecified DLF, with definite descriptions and other anaphoric relations resolved. The system proceeds
Extracting and Using Discourse Structure to Resolve Anaphoric Dependencies
incrementally (sentence by sentence) through the input document. Our overall system task has two main subtasks: (A) the detection of referential items and discourse units and features of these units; and (B) the determination of relations among the detected items, based on the features for coreference resolution (B-1) and determination of rhetorical relations (B-2). For each of the system tasks we use a combination of modules with different depth/robustness: task (A) is addressed by a named entity recognizer and by a deep syntactic grammar, which itself has a fallback strategy to shallower techniques. The (B) tasks are addressed by a deep SDRT module applying domain knowledge and various more robust and domain-independent modules: rule-based coreference modules and rhetorical relation modules, and one (or several) statistical discourse modules. (A) Feature Detection for Referential Items and Discourse Units: Named Entity Recognition Text Document
Preprocessing
XLE Parsing
XLE Disambiguation
Feature Detection for Referential Items and Discourse Units
Sentence-by-sentence Agenda of Referential Items and Discourse Units
(B) Coreference Resolution and Determination of Rhetorical Relations: (Partially underspecified) Discourse Context Representation Domain-knowledge-based Deep Discourse Module Sentence-by-sentence Agenda of Referential Items and Discourse Units
Statistical and Rule-based Coreference Resolution Modules
Ranked Lists of Coreference Candidates
Statistical and Rule-based Rhetorical Relations Modules
Ranked Lists of Discourse Unit Attachment Candidates
Voting Module
Discourse Logical Form
Additional Lexical Resources
Figure 1: Overview of system architecture We address subtasks (B-1) and (B-2), resolution of anaphors and determination of rhetorical relations, with parallel, interacting modules, using both statistical (e.g., boostrapping) techniques and rule based techniques. SDRT requires such interaction, since sometimes it is crucial to do coreference resolution before computing a discourse relation, as in: (1)
I met an interesting couple yesterday. He worked for the New York Times and she is assistant director in the Soros Foundation.
The rhetorical relation between the sentences cannot be determined without knowing who he and she are. If they are part of the couple, then we can easily compute a discourse relation, and luckily in this case independent information from syntax and semantics dictates this bridging
Nicholas Asher et al.
reference. On the other hand, if we have several possible antecedents for an anaphoric expression as for the company in in ((2)- ), then computing discourse structure and rhetorical relations can often improve retrieval of the intended antecedent. We will control the complex interaction of tools and resources with a clear goal-directed agenda, a list of referential and discourse items output by the (A) task. The rhetorical structure of discourse is instrumental in the coreference task, but the system will leave rhetorical structure underspecified to the extent it is irrelevant for coreference resolution decisions.
3 Overview of our System SDRT countenances many discourse relations but in this project, we will concentrate on just a few: Elaboration, Explanation, Commentary, Background and Continuation. These relations are prevalent in our corpus and we have several domain independent ways of inferring them: explicit discourse markers (because for Explanation, verbs of saying for Commentary, also, but, too for Continuation; aspect shifts in the main verbs of the related constituents for Background; or default rules about the newspaper style (e.g., our newspaper stories typically have a first sentence as the ‘lead’ or ‘topic’ sentence, with the rest of the first paragraph functioning as an Elaboration of the topic). We exploit, for instance, how newspaper stories are structured in developing simple axioms for Elaboration, and we will be able to infer Background using aspectual information that we will get from the XLE parser. However, to compute other instances of rhetorical relations, we need lexical information and world knowledge. (Asher & Lascarides, 2003) rely on detailed lexical information to compute
and other predicates crucial for inferring discourse relations, which at present we cannot implement with sufficiently wide coverage. To improve our implementation’s behavior in specifying these elements, we will resort to both statistical training over an annotated corpus of such stories, as well as a deeper inference system based on lexical meaning and domain modeling. Both the collection of this corpus and the development of an in-house automatic annotation tool for both anaphoric expressions (including definites of different kinds) and discourse relations are currently under way (Cresswell et al., 2003). A naive semantics lexicon like that of WordNet, or WordNet augmented with a domain specific ontology, furnishes, we think, a good basis for information about causal and subtype information. The computational model of our project uses a variety of modules that do not apply in a pipeline sequence, but rather can interact in multiple ways. Ours is a hybrid approach, using both deep symbolic system components and a set of shallower and more robust components. We go beyond a simple two-way split into separate system strands with limited interaction (as was adopted for instance in Verbmobil (Wahlster, 2000)): a central project task is to explore innovative techniques for feeding results from subcomponents back into the development cycle of other components. One obvious way to feed insights from theoretical research into statistical NLP components is through accurate annotation of training data. However, this route allows for very limited flexibility once the annotation process has begun; so we explore more dynamic ways of exploiting theoretical insights in a corpus-based training approach, using bootstrapping and active learning techniques. We expect that results from the shallower components will influence research on the deeper components; for example, a corpus-based exploratory analysis of the coverage and predicted effect of a particular ‘deep’ component (like the discourse attachment computation module) in
Extracting and Using Discourse Structure to Resolve Anaphoric Dependencies
a system context of robust instances of the surrounding components may reveal areas that need further restriction or additional rule options.
3.1 Subtasks for Discourse Item Identification For preprocessing of raw text documents, we will rely primarily on existing tools (e.g., (Reynar & Ratnaparkhi, 1997)). The only substantial preprocessing task related to the discoursesemantic core of our project concerns the identification of multi-sentence quotes. Literal quotes receive a special treatment in our system: they are entered verbatim in the discourse representation, assigned the relation of Commentary, and noted as to the source of the quote. We will mainly rely on cues like punctuation and verbs of saying for quote identification. For syntactic analysis, we will apply the XLE parsing system developed by PARC for Lexical Functional Grammar. LFG assumes two main representation structures for deep syntactic analysis: a phrase structure tree (c-structure) and an attribute-value matrix representing dependencies in terms of grammatical functions (f-structure). The f-structure is derived from the lexico-semantic argument structures of words and expresses the predicate-argument structure of a sentence along with morpho-syntactic information. (Van Genabith & Crouch, 1997) argue that LFG’s f-structure can serve as a scope underspecified semantic representation. Thus, the output produced by the ParGram grammars is a very good starting point for a discourse-semantic component (whose focus is on scalability rather than very fine-grainedness for a small domain). We use an existing NLP tool that was originally developed by PARC for transfer in Machine Translation (cp. (Zinsmeister et al., 2002)) to tailor the output f-structure to our purposes. The first task for the transfer mechanism is to enhance the XLE parser output with information regarding identified discourse entities. That is, we will enhance the f-structure output with results from a named entity recognizer (NER), so that the NER designation for each identified entity becomes a feature of that entity in the f-structure. The NER provides a semantic typing of proper nouns and some common nouns (according to a set of types likes Person, Organization, Place, Time, etc.). Next, we augment the representation with lexicosemantic information, using the transfer system to interface with external sources of information. Situation entity detection and classification is a second application of our transfer mechanism. We will first use lexical conceptual structure (LCSs) of (Dorr, 2001) to determine lexical and aspectual type for each verbal predicate. WordNet senses and PropBank frames are included in the database representations and may be used in later processing. We use the mechanism to apply to the augmented f-structure representation an ordered battery of linguistic tests, which are designed to pick out and label situation entities. To date we have derived and ordered eighteen separate tests. The key to our characterization of discourse entities is the ordering of the tests. When in conflict, results from higher-ranking tests are preferred to results from lowerranking tests, exploiting the transfer system’s mechanism of an ordered set of rewrite rules to increase the strength of the weaker linguistic tests. Such an enhanced f-structure modified by the transfer mechanism encodes the presence of discourse entities of various types and will be suitable input for the determination of relations between those entities. The representation produced by this part of the system contains such a broad range of lexical, semantic, syntactic, and discourse information that we believe it will be useful for comparison and for applications beyond this project. We will make our representation compatible with the TimeML annotation standard developed by the TimeML Working Group
Nicholas Asher et al.
under ARDA sponsorship (Pustejovsky et al., 2003).
3.2 Subtasks for Anaphora Resolution and Rhetorical Relation Detection Our implementation of SDRT uses various sources of information to guide anaphora resolution tasks (which include bridging inferences): morpho-syntactic information, lexical semantics, discourse structure, as well as some domain knowledge. Numerous theoretical and computational approaches to intersentential anaphora exist (see (Mitkov, 2002) for an overview). Most theoretical approaches have concentrated on particular factors affecting anaphora resolution; for instance, Centering Theory (Grosz et al., 1995) focuses on identifying syntactic cues, while dynamic semantics, e.g., DRT (Kamp & Reyle, 1993) and DPL (Groenendijk & Stokhof, 1991), have focused on semantic factors like the impact of negation and quantifiers on accessibity of antecedents. On the other hand, many computational anaphora resolution systems make a number of simplifying assumptions— e.g., considering only full noun phrases as possible antecedents (and in particular only those NPs that appear in the current and preceding sentence or to consider only one type of pronouns, or one very specific domain. One important challenge in resolving anaphoric expressions in free-occurring text is to reduce the space of possible antecedents. To tackle this challenge, most extant systems have incorporated a notion of local focus, originally developed by (Sidner, 1979). In addition for bridging descriptions, recent proposals like (Poesio et al., 1997) have used the WordNet lexical database to compute the bridging relation between a definite description and its anchor. This study has revealed that WordNet alone gives rather poor results, both in terms of recall (around 56%, according to (Poesio et al., 1997)) and in terms of precision (less than 30%). (Poesio et al., 1997) report a significant reduction in false positives by using a simple stackbased approach in the spirit of (Sidner, 1979) in order to reduce the space of accessible anchors. Given a definite description in a sentence, the system will search for the closest antecedent, going back one sentence at a time. A first look at our corpus reveals that this simple stack-based approach may not be the optimal strategy. Consider the newswire example (2), trying to determine the antecedent for the company in . (2)
ADC Names Stephen Mitchell Vice President of National Accounts ( ) MINNEAPOLIS, Jan 30, 2003 (BUSINESS WIRE) – ADC (Nasdaq:ADCT; www.adc.com) today announced the appointment of Stephen Mitchell as Vice President of National Accounts. ( ) In this role, Mitchell has overall responsibility for driving ADC’s channel sales strategies, including channel development in North America. ( ) He will also have responsibility for several service provider accounts including Sprint and Alltel as well as ADC’s government, original equipment manufacturer (OEM), and broadcast markets. ( ) Based in the company’s Richardson, Texas, office, Mitchell reports to Jay Hilbert, ADC’s senior vice president of Global Sales and Marketing. ( )
The intended referent of the definite is ADC. The closest antecedent (i.e., the optimal antecedent for the stack-based approach) is in . Thus, the NP several service provider accounts including Sprint and Alltel as well as ADC’s government, original equipment manufacturer (OEM), and broadcast markets contains several proper names that are likely, given some basic domain modeling, to be recognized as companies. However, the actual antecedent for the company is
Extracting and Using Discourse Structure to Resolve Anaphoric Dependencies located much higher up in the discourse context, namely in . This example shows that the stack strategy fails with examples of ‘long-distance dependencies’ like (2). We amend the stack-based approach in two ways. First, we use a much more precise and richer theory of discourse structure, namely SDRT. Second, we will exploit the uniqueness presupposition of definite descriptions. We discuss the benefits of these two strategies in the following. SDRT’s computation of discourse relations has two important side-effects on our task: (i) it determines the set of available attachment points for new information, therefore restricting the space of possible referents for an anaphora, and (ii) it constrains the semantic content of the constituents that are to be connected together, sometimes forcing the definite descriptions to have a particular interpretation. Suppose our task is to connect segment from (2) to the discourse structure already built for the preceding discourse, and to resolve the description the company in this segment. We can assume that the SDRT axioms, described above, will give rise to the discourse structure (3) for segments : ( )
(3) Elaboration ( )
Elaboration ( )
(
Elaboration Elaboration ) ( )
Continuation
Note first that one of the possible attachment sites for , namely , is ruled out because it does not appear at the right periphery of the graph. Any antecedent contained in will have been discarded, which shows that computing discourse structure sometimes does reduce the search space of antecedents. Now, we must choose between the three remaining attachment sites , , and . The stack-based approach predicts to be the optimal resolution site. Under our approach, is ruled out because there is no way to compute a discourse relation between and . First, there is no Continuation cue in to suggest the coordinating relation. A second hint suggesting a discourse ‘pop’ has to do with the way the individual Stephen Mitchell is referred to: although the author uses an anaphoric pronoun in , he switches back to the proper name form in ; that is, he moves down on the familiarity scale suggesting a non local attachment. For choosing between and , we plan to invoke a Lower Attachment Constraint, which would force to attach as low as possible when all the other conditions are met (this in effect preserves some effects of the stack-based approach). This example clearly shows how discourse structure can constrain the resolution of anaphoric material. It is worth noting that there is in this example one crucial ingredient that can be missing in some easily manufactured examples involving anaphoric reference with definite descriptions. In this example, many companies are mentioned; this forces us to directly exploit the discourse availability constraint. As was already noted in (Asher, 1993), however, definite descriptions can sometimes pick out antecedents not on the right frontier. Indeed, they can
Nicholas Asher et al.
sometimes serve to redefine what is the right frontier, forcing an attachment of new information to some currently unavailable attachment point by picking out that constituent explicitly. The presupposition of uniqueness sometimes enables us to search even the whole discourse structure. So for example if in another story, only one company was mentioned in the previous text, a subsequent use of the definite the company can pick out that company, provided certain constraints of topicality or salience and of "heaviness" of the definite are met. The situation with definites is thus more complex than with pronominal anaphors. We know of no examples, natural or constructed, where pronouns pick up their antecedents from an unavailable discourse constituent.1 Our generalization is that if the uniqueness presupposition of definite can be satistied in the whole discourse context, then the antecedent may occur anywhere (modulo possible topicality constraints); if not, then the uniqueness presupposition must be satisfied among the available constituents. Though not yet implemented, our anaphora resolution system will use discourse structure together with lexical semantics. We will make use of the WordNet lexical database possibly augmented with a domain specific ontology to furnish, we think, a good basis for information about causal and subtype information for inferring discourse relations. As we noted above, WordNet without discourse structure gives rather poor precision and recall results.
3.3 The statistical track Our discourse semantic system uses a variety of components with partially overlapping functionality, exploiting the various components to the extent that they produce a reliable output. The statistical track, which is still very programmatic at this stage, will apply state-of-the-art machine learning techniques in which interaction of confidence-rated information sources is instrumental: specifically, we will adopt the bootstrapping/co-training methodology of weakly supervised learning, which has been very influential in NLP research in the past few years (Yarowsky, 1995; Abney, 2003). This approach allows us to augment a small set of labeled training data (manually annotated for the gold standard analysis), the seeds, by exploiting a large amount of unlabeled training data. An iterative process picks instances from the unlabeled data for which the present system (exploiting only some of the available information sources) has a high labeling confidence. These additional instances are added to the training set for the next bootstrapping generation of the system (which incorporates more or different information sources for learning). (Ng & Cardie, 2003) have already applied a Bootstrapping approach to the coreference task of MUC-6 and MUC-7, but we will apply the technique on a significantly richer set of information sources and we will address the coreference problem in tandem with the discourse structuring problem. Bootstrapping is an important testbed for the representations and the components of our project. We plan to explore how well the approach is suited for bringing together such diverse components. Given previous work on bootstrapping techniques, the field is in a position to expect interesting insights from a careful analysis of the role of the choice of learning features in a task like discourse-based coreference resolution. Our multi-component project architecture puts us in position to run controlled experiments with various feature sets underlying the same bootstrapping algorithm. A crucial insight from an SDRT-based theoretical approach is that coreference decisions are interleaved with discourse structuring decisions; i.e., when we resolve a discourse attachment/rhetorical relation labeling decision for the current sentence we 1
Note that structural relations like Parallel and Contrast also affect the notion of availability.
Extracting and Using Discourse Structure to Resolve Anaphoric Dependencies
simultaneously narrow down anaphoric relations for the elements introduced in that sentence. Previous machine learning approaches such as (Ng & Cardie, 2003) to coreference could not exploit this correlation in great depth. We will explore bootstrapping of both coreference resolution and discourse structure attachment/labeling. Two classifiers will be learned from a large pool of unannotated text: a coreference classifier (C) and a discourse relation classifier (D). Both learning processes will be provided with a set of seeds, i.e., a comparatively small set of manually annotated data. (The seed data for each of the two classifiers could be distinct, but we will presumably use the same texts, annotated for both types of information. Since it is not our goal to explore how effective a minimally supervised learning approach is, we do not plan to keep the seed set extremely small, but we will use as many labeled data as we can obtain with reasonable effort; one notable source will be the rule-based components of our project, applied to corpus data and sighted manually.) Normally, bootstrapping or co-training is applied on a single classification task for which several information sources exist (potentially split into two views). We will apply such standard single-view bootstrapping processes for (C) and for (D) respectively (exploiting the rich set of features we can provide for the data). However, following SDRT, we will also explore crosstalk between the two bootstrapping processes. In order to apply the bootstrapping technique to two related classification problems, there has to be a mapping between the two classification systems. For our tasks (C) and (D), we cannot specify an exhaustive mapping—knowing the discourse structure of a text is not sufficient for resolving all anaphoric relations, and vice versa. But we can specify hard rules (from SDRT) which narrow down the possibilities for (C), given (D), and vice versa. These rules can apply in the instance selection step, i.e., when picking additional instances from the pool of unlabeled data: normally, bootstrapping for the (C) task would only rely on the confidence-rating from the previous bootstrapping generation; in the cross-talk scenario, instance selection would be filtered by the predictions of a (D) system, predicting certain discourse structure configurations for the same pool of data. Neither of the two systems is free of errors at this point, and the mapping between the two classifications is partial; but this means nevertheless that from the larger pool of instances, we may choose to preferably select those for which there is no contradiction between (C) and (D), and the confidence of both systems is high. It is an empirical question how effective such a cross-talk approach can be; we believe that our project set-up puts us in an excellent position for addressing this question.
References A BNEY S. (2003). Understanding the yarowsky algorithm. Ms., University of Michigan. A SHER N. (1993). Reference to Abstract Objects in Discourse. Kluwer Academic Publishers. A SHER N. & L ASCARIDES A. (2003). Logics of Conversation. Cambridge, UK: Cambridge University Press. C OPESTAKE A. & F LICKINGER D. (2000). An open-source grammar development environment and english grammar using HPSG. In Proceedings of the Second Conference on Language Resources and Evaluation (LREC 2000), Athens. C RESSWELL C., F ORBES K., M ILTSAKAKI E., P RASAD R., J OSHI A. & W EBBER B. (2003). Penn discourse treebank: Building a large scale annotated corpus. encoding dltag-based discourse structure and discourse relations. Ms., UPenn.
Nicholas Asher et al.
D ORR B. (2001). LCS verb database. http://www.umiacs.edu/˜bonnie/ LCS_Database_Documentation.html. G ROENENDIJK J. & S TOKHOF M. (1991). Dynamic predicate logic. Linguistics and Philosophy, 14, 39–100. G ROSZ B., J OSHI A. & W EINSTEIN S. (1995). Centering: A framework for modelling the local coherence of discourse. Computational Linguistics, 21(2), 203–226. K AMP H. & R EYLE U. (1993). From Discourse to Logic: Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Kluwer Academic Publishers. M ITKOV R. (2002). Anaphora Resolution, In Oxford Handbook of Computational Linguistics, p. 266–283. Oxford University Press: Oxford. N G V. & C ARDIE C. (2003). Bootstrapping coreference classifiers with multiple machine learning algorithms. In Conference on Empirical Methods in Natural Language Processing (EMNLP-03). P OESIO M., V IEIRA R. & T EUFEL S. (1997). Resolving bridging references in unrestricted text. In Proceedings of the Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts, ACL1997. P USTEJOVSKY J., C ASTANO J., I NGRIA R., S AURI R., G AIZAUSKAS R., S ETZER A. & K ATZ G. (2003). TimeML: Robust specification of event and temporal expressions in text. ESSLLI 2003 workshop proceedings. R EYNAR J. C. & R ATNAPARKHI A. (1997). A maximum entropy approach to identifying sentence boundaries. In Proceedings of the Fifth Conference on Applied Natural Language Processing, Washington, D.C. S CHLANGEN D. & L ASCARIDES A. (2002). CETP: An Automated Theorem Prover for a Fragment of Commonsense Entailment. Rapport interne EDI - INF - RR -0119, Division of Informatics, University of Edinburgh, Edinburgh. S IDNER C. L. (1979). Towards a computational treatment of definite anaphora comprehension in English discourse. PhD thesis, MIT, Cambridge, Ma. VAN G ENABITH J. & C ROUCH R. (1997). On interpreting F-structures as UDRSs. In P. R. C OHEN & W. WAHLSTER , Eds., Proceedings of the Thirty-Fifth Annual Meeting of the ACL and Eighth Conference of the EACL, p. 402–409, Somerset, New Jersey: Association for Computational Linguistics. W. WAHLSTER , Ed. (2000). Verbmobil: Foundations of Speech-to-Speech Translation. Berlin, Heidelberg, New York: Springer. YAROWSKY D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, p. 189–196, Cambridge, MA. Z INSMEISTER H., K UHN J. & D IPPER S. (2002). Utilizing LFG parses for treebank annotation. In LFG 2002, Athens.