Extraction of Predicate-Argument Structures from Texts - Université du ...

12 downloads 0 Views 67KB Size Report
For example, in “The thief broke the window with a stone”, with is the marker ..... USA), 1994. (Larrick 61) N. Larrick. Junior Science Book of Rain, Hail, Sleet and.
Extraction of Predicate-Argument Structures from Texts Sylvain Delisle1 & Stan Szpakowicz2 1

Département de mathématiques et d’informatique Université du Québec à Trois-Rivières Trois-Rivières, Québec, Canada G9A 5H7 email: [email protected]

2

School of Information Technology and Engineering University of Ottawa Ottawa, Ontario, Canada K1N 6N5 email: [email protected]

Abstract We consider extraction of predicate-argument structures from a single text with a substantial narrative part. Working with such texts rather than large corpora requires detailed syntactic analyses, a learning mechanism and a cooperating user who confirms automatically generated results. We describe a system with such capabilities. The system has been tested on a variety of texts and has recently undergone an experimental evaluation. User participation appears not onerous at all, and there is a clear learning pattern.

1 Introduction Work on lexicons has arisen as a central issue in Natural Language Processing (NLP). (Wilks et al. 96; Guthrie et al. 96; Saint-Dizier & Viegas 95) give an up-to-date overview. Despite common acceptance of the pivotal role of lexicons, debates continue on the function, contents and organization of life-size lexicons, and the methods of their creation and maintenance. In this paper, we address some of these questions, and offer a few practical solutions within the context of an implemented system of knowledge acquisition from texts. Verbs play a fundamental role in NLP, so verb information in lexicons is essential. A verb as a predicate identifies a relation between entities denoted by the subject and complements. This can be represented as an argument structure, a subcategorization frame, or a list of selectional restrictions. Existing machine-readable dictionaries—from which realistic lexicons must be constructed—contain little verb information sufficient for NLP purposes. It must be acquired from other sources. We discuss four characteristics of an acquisition method implemented in the TANKA system (Delisle et al. 96). Oriented toward applications and knowledge extraction. We value NLP applications and sound NLP engineering principles. Our long-term research goal is a workbench for a knowledge engineer to create conceptual models of

domains described by actual texts. The key measure of success will be minimization of reliance on a priori domain knowledge, as expressed by (Delisle et al. 96). The TANKA system uses only publicly available lexical data (the Collins dictionary and WordNet) and surfacesyntactic features of the text to propose analyses for a cooperating user to approve or adjust. Predicate-argument structure extraction, our main concern in this paper, is a necessary step towards the construction of conceptual models. Linguistically-motivated and non-statistical. Unless one heavily relies on rich a priori domain knowledge, syntactic structures must provide the crucial support for meaning analysis and extraction. A broad-coverage parser is required, of the kind proposed by (Delisle 94; Delisle & Szpakowicz 95). It should ensure almost complete acquisition of predicate-argument structures from text. In contrast with statistically-based approaches (Ribas 95; Grishman et al. 94), Delisle considers not only sentences with verbs that occur a statistically significant number of times. TANKA can analyze all sentences of a relatively small text, or portions of a very large text that are of interest to the user. This also ensure that analysis results will contribute to the building of an overt explicative model of a text’s domain. Several interesting papers on symbolic versus statistical approaches appear in (Klavans & Resnik 94). Based on actual texts. Predicate-argument structures can be extracted from machine-readable dictionaries or texts. We choose texts that are not part of larger corpora, are reasonably well written (for parsing to succeed often enough) and have long stretches of narrative (to yield many clauses to work with). We can name two reasons for our choice. First, verbs behave differently in different domains (Jensen 91; Basili et al. 94), that is, have different types of complements (this, in turn, explains why we hold that every sentence may contain useful semantic data and therefore should be fully processed). Second, dictionaries do not cover all word usages in the ever-growing body of text, particularly on the Internet. Moreover, it takes long for a new word use to enter the traditional dictionaries. Incidentally, predicate-argument

structures extracted from texts could even help revise and update existing dictionaries. Semi-automatic and incremental. The foregoing suggests that predicate-argument structure extraction cannot proceed completely automatically. Case-based semantic analysis, using the results of automatic syntactic analysis, requires normally the supervision of a cooperative user; she accepts the system’s analysis or, occasionally, adjusts it, and may choose among a few proposed interpretations. TANKA’s semantic analyzer HAIKU memorizes its suggestions and the user’s decisions to increase its knowledge of verb usages in the text at hand. With more text processed, HAIKU learns to make better suggestions, and the user’s contribution becomes simple approval. A pattern of learning has been clearly demonstrated in recent experiments (Barker & Delisle 96). Details appear in Section 5.

2 Related Work Work reported here uses existing technical texts—for a characterization of such texts, see (Copeck et al. 97). Other knowledge acquisition or extraction work considers other types of corpora. (Poznanski & Sanfilippo 96) use definitions from Longman’s Dictionary of Contemporary English. (Gomez et al. 94) use those of The World Book Encyclopedia. This work also looks at the processing of actual raw texts, whereas other approaches, such as (Ogonowski et al. 94), consider pre-processed and disambiguated corpora. Others, like (Hastings & Lytinen 94), assume a priori availability of domain-specific knowledge, while we do not; in fact, our aim is to acquire such knowledge from texts. This work also differ from (Pugeault et al. 94) whose acquisition strategy is based exclusively on partial analysis of text; we view partial (or fragmentary) analysis as a fallback strategy when full analysis fails, and this is how DIPETT works. As reported in (Delisle 96), it is clear that fragmentary parsing allows less complete knowledge acquisition from text. Another difference from research of (Pugeault et al. 94) and (Liu & Soo 93) is that the semantic cases assigned by their system cannot be revised and modified by the user in order to help the system improve its performance over time. Now for work concerned specifically with automatic construction of dictionaries or lexicons. (Cardie 93) shows how to acquire from a corpus parts of speech and distinct senses of open category words. (Grishman et al. 94) and (Sanfilippo 94) discuss the problems, and some solutions, associated with the construction of large lexicons for automatic text analysis. (Riloff 93) presents AutoSlog, a system that built a domain-specific dictionary of concepts for the MUC-4 information extraction task. Work on automatic lexicon construction that is more similar to ours in being especially (although not exclusively) interested in verb entries, is that of (Brent 91; Briscoe & Carroll 97; Framis 94; Grishman & Sterling 94; Manning 93; Poznanski & Sanfilippo 96; Myaeng et al. 94) and (Pugeault et al. 94), who present different approaches to the extraction of subcategorization information, selectional restrictions, or predicate-argument structures from corpus analysis.

(Basili et al. 92) and (Sekine et al. 92) show how to acquire semantic collocations from a corpus.

3 The Text Analysis System Syntactic analysis in TANKA is performed by DIPETT (Domain-Independent Parser of English Technical Texts). This broad-coverage parser, implemented using Definite Clause Grammars (Delisle 94; Delisle & Szpakowicz 95), has rules based on (Quirk et al. 85). DIPETT is not a lexicalist grammatical framework. It does not use subcategorization frames, and does not adhere to a specific linguistic theory. Its output consists of a single parse tree, the best analysis according to its surface-based heuristics. Tests on a number of actual, unedited texts have produced either full parses or fragmentary parses (into sequences of phrases) for up to 95% sentences. The proportion of full parses is higher for carefully edited narratives, lower for such texts as manuals or user guides. A full parse is a first good syntax tree, grammatical but not always semantically perfect. In a recent in-depth evaluation (Barker & Delisle 96) there were 47% semantically perfect parses and 19% with minor errors, including misattached prepositional phrases (PP). 2 trees out of 3 were easily amenable to further semantic processing. This did not include verb phrases in fragmentary parses that could also be case-analyzed. PP misattachment occurred in 20% imperfect parses. Legitimate but unlikely parses may arise from lexical (mostly noun-verb) ambiguity. In TANKA, we do not assume that input texts are free of ambiguity, but the user is expected to help the system select only one parse tree and only one semantic analysis for each input string analyzed. Imperfect parses affect, or even preclude, semantic analysis. We have built an interactive reattachment module as part of the READER system (Delisle 96). Using a simple drag-and-drop interface, its user can manipulate certain types of constituents to correct simple misparses (e.g. a misattached prepositional phrase), and supply correct syntactic input to semantic analysis. READER is an umbrella over DIPETT, HAIKU and LEXICOGRAPHER. The latter helps construct text- or corpus-specific lexicons from corpora for use by DIPETT. There is a graphical user interface to an indexed text base, lexicons, parse trees, case structures and predicate-argument structures. A text-specific lexicon contains information necessary to parse all and only sentences in the given text. An entry points to relevant examples of the word in the text base and optionally to a taxonomic category derived from WordNet. LEXICOGRAPHER also computes word frequencies and indexed lists of occurrences of all words in a corpus, and allows access to Brill’s tagger (Brill 92). A sentence, DIPETT’s unit of processing, contains a main clause and possibly subordinate clauses. Before semantic analysis will begin, parse trees of structurally complex sentences are decomposed into parse trees for clauses. Each clause, HAIKU’s unit of processing, is built around a main verb which is essential in case-based analysis. HAIKU finds syntactically marked semantic relationships at three levels: Clause-Level Relationships (Barker &

Szpakowicz 95), cases (Delisle 94; Delisle et al. 96) and Noun-Modifier Relationships (Barker 97). Case relations are essential in building predicate-argument structures, as they represent semantic relationships between a clause’s main verb and its syntactic arguments (subject, objects, PPs, adverbials). Links labeled by cases usually correspond to roles in the activity denoted by the verb; for example, the AGENT case identifies an entity that acts. Cases appear in texts as predicate-argument structures, each case denoted by a syntactic marker and occupied by a particular filler. For example, in “The thief broke the window with a stone”, with is the marker and a stone is the INSTRUMENT of the activity. Case-based analysis is appropriate because it goes well with syntax-centered approaches, and can be presented intuitively to the user. It is DIPETT and the Case Analyzer that allow TANKA to extract predicate-argument structures directly from texts. The TANKA/READER system is fully implemented in Quintus Prolog 3.2.

4

From Syntax Trees to PredicateArgument Structures

A case system has been defined for use with the HAIKU system. (Delisle et al. 96; Barker et al. 97) present and carefully motivate a list of 28 cases, and discuss other published lists.* HAIKU’s Case Analyzer (CA) takes a syntactic structure from DIPETT. For each clause CA identifies—under the user’s supervision—the case pattern that best represents its verb’s meaning. Cases are signaled by case markers, realized in two ways. A lexical marker is usually a preposition; it can also be an adverb. Thus, the system collects information about a verb’s complements and adjuncts. A positional marker is a surface-syntactic role: subject, direct object, indirect object (notated as psubj, pobj, piobj). Markers are used in Case-Marker Patterns (CMP) on which semantic patterns are indexed in HAIKU. For example, the CMP psubj-pobj-at is produced for a clause where the main verb has a subject, a direct object, and a complement PP introduced by at. This leads HAIKU to the Case Pattern (CP) agt-objlto, if the user accepts the association of the subject with the agent, the direct object with the semantic object, and the at PP with destination. CA accumulates CMPs and CPs in its dictionaries, to which it refers as new sentences are processed. A sentence whose patterns differ significantly from previously encountered patterns may introduce new semantic facts. These are integrated into the growing dictionaries. For a sentence semantically similar to previously analyzed sentences, CA suggests an interpretation for the user to confirm or reject. This similarity is evaluated by means of a closeness metric (Delisle et al. 93): we have translated the difficult problem of finding the best approximation of a given semantic pattern into the computationally feasible task of *

Six case names appear in the examples in this paper: agt (Agent), expr (Experiencer), lfrm (LocationFrom), lto (LocationTo), obj (Object) and tat (TimeAt).

finding the best match for a pattern of symbols (i.e. CP) which denote basic semantic relations. Linguistically speaking, we define semantic similarity via syntactic congruence and compute it from syntactic markers (CMP). This metric allows our system to suggest to the user valid candidate semantic patterns for the input at hand, based on the system’s knowledge accumulated so far. The dictionaries may be empty when HAIKU is started on a new text. CA needs no seed knowledge to work properly, though the user’s involvement in the initial phase is high if CA starts from scratch. That is, at time t=0, the system has no suggestions to make to the user for the very first sentence it processes, since it relies only on syntax and does not blindly compute alternatives: it depends on information accumulated from inputs already processed. But even then, the user is constantly and systematically guided by the system. For instance, when the system has no CP suggestions for the user, it lists the potential cases that can be associated with the case markers found in the current clause. This information is fetched from HAIKU’s general list of CM-case pairs. This facilitates the user’s job in identifying the relevant CP for the clause at hand. The intensity of interaction decreases as CA acquires more patterns. The learning accomplished by HAIKU in such circumstances is instance-based learning (Aha et al. 91), supervised, incremental, and based on similarity between old and new patterns. We present some details of the system, with the emphasis on HAIKU’s knowledge elements and dictionaries that underlie the derivation of predicate-argument structures from text by parsing and case analysis. The following simplistic paragraph helps make points in the presentation that would be obscured by a realistic text of the kind we used in our recent experimental evaluations (Atkinson 90; Larson 61). Bob printed the new data file. Beth and Tom will print their letters. Their boss could not print the production report on the new laser printer. The new computer caused a power failure yesterday. We know that your boss would not delete all your data. These new employees have deleted my letters from my disk. A CMP is formally an ordered list of case markers, one for each syntactic marker in a clause. A clause analyzed by DIPETT has exactly one CMP, since it has one syntactic analysis, with all phrase attachments resolved (and perhaps adjusted by reattachment). This, as we said in Section 3, does not imply that any potential ambiguity is entirely ruled out. We only assume that every clause will be assigned a single correct syntactic analysis (CMP) from which a single correct semantic interpretation (CP) will emerge—correctness is determined by the user. The Meaning Dictionary (mDict) has entries for prepositions (predefined), adverbs and verbs (most of them usually added during HAIKU’s operation). A verb entry contains a list of CMPs found with this verb in the text, plus their count of occurrences in this text, and a list of cases (with their counts) associated with individual markers. The numbers allow us to identify, for example,

the most frequent CMP for a given verb; probabilities could easily be computed from these numbers. An entry for a prepositional or adverbial case marker contains a list of cases the marker can realize. This list is fixed for prepositions and almost fixed for adverbs. There are, for example, two CMPs in the mDict entry for delete after the paragraph has been analyzed: psubjpobj and psubj-pobj-from; occurrence count is kept for each CMP. For every case marker in these CMPs there is a list of case labels it marks, with occurrence count. A CP is an ordered list of symbols representing the cases appearing in a clause. A semantically ambiguous clause may have more than one CP but after the user’s intervention only one CP will be assigned. In a sense, the user may play the role of a “disambiguation filter”. For example, it is debatable whether the subject is an agent or an experiencer in “The snow is falling on our heads”. The user would select one interpretation, and her decision would be memorized by the system. An implicit assumption in TANKA, as in any semi-automatic system, is that the cooperating user makes consistent decisions. The Case-Marker Pattern Dictionary (cmpDict) has entries for CMPs. An entry contains a list of CPs already associated with this CMP, and the occurrence count for each CP. For each CP there is an example sentence, to be used in interaction with the user. The cmpDict may be initialized with entries for commonly appearing CMPs. Processing the sample text with an initially empty cmpDict produces an entry for the CMP psubj-pobj, linking it to the CP agt-obj with four occurrences. The CMP psubj-pobj-from is linked to the CP agtobj-lfrm. The CMP psubj-pobj-on is linked to the CP agt-obj-lto, each with one occurrence. A representative example sentence from the actual text (approved by the user) is associated with each entry. The Case Pattern Dictionary (cpDict) has CP entries, each containing a list of verbs associated with this CP in the text. For the sample text, the CP agt-obj is associated with three verbs: delete, know, print. The CP agt-obj-lfrm is associated with the verb delete. The CP agt-obj-lto is associated with the verb print.

ccvpIndex is a structured index that serves two purposes: providing all data to complete the predicate-argument structure and facilitating access to the results of parsing and Case Analysis. Results are saved by HAIKU in a distinct file and indexed on the numbers of the text’s input units. A ccvpIndex entry is created for every unique CMP-CP-verb combination, for which it stores the unit number #(N), a surface subcategorization frame for each occurrence of the verb (sr_types), and the tense of each occurrence. For example, ccvpIndex tells us that the verb delete occurred in unit #6 with the CMP psubjpobj-from, that each CM occurred as a noun phrase, another noun phrase and a from-prepositional phrase, respectively, and that the associated CP is agt-objlfrm. Thanks to the information it accumulates in its dictionaries, HAIKU automatically builds predicateargument structure entries for verbs. Every list element for a verb represents a CMP, a CP and a phrase pattern. For the above example text, these are: verb_pred_struct(cause, [psubj-pobj-adv/ agt-obj-tat/np-np-adv]). verb_pred_struct(delete, [psubj-pobj/agt-obj/np-np, psubj-pobj-from/ agt-obj-lfrm/np-np-from]). verb_pred_struct(know, [psubj-pobj/expr-obj/ np-nom_cl]). verb_pred_struct(print, [psubj-pobj/agt-obj/np-np, psubj-pobj-on/ agt-obj-lto/np-np-on]). Order does not matter in patterns: only their semantic interpretation counts. For example, subj-obj-at-by is equivalent to subj-obj-by-at. Interestingly, experience with TANKA (Barker & Delisle 96) shows that patterns need never be reordered. We conjecture that the default SVO order in HAIKU corresponds to the reality of well-written English texts.

300

Case Patterns

250

system

200 150 user 100 50 0 0

50

100 150

200 250

Clauses

300 350

400

Figure 1. The total number of case patterns determined by the system and patterns supplied by the user during experimental evaluation

5 Experimental Evaluation We have argued that the validation of our case system comes from its use in the case analysis of some large number of real English sentences of real English sentences. The system has been built for English, but we consider our approach valid for other, even free-order, languages, in which cases are present and for which there is a comprehensive surface-syntactic parser. Previously, we have tested HAIKU’s case analyzer on such English texts as the Canadian income tax guide and a fourthgeneration computer language manual. Since settling on the current set of twenty-eight cases, no tests have identified a need for new cases. However, none of these tests were conducted primarily to validate the coverage of the cases themselves. The test described in (Barker & Delisle 96), and referred to here as the case test, was designed with several explicit goals, among them the experimental validation of the case system. The case test was conducted on (Larrick 61). This book uses less complicated language than many technical texts, allowing us to get a high percentage of sentences parsed well enough to exercise the HAIKU semantic analyzer. The book’s 513 sentences yielded 439 finite, non-stative clauses usable for case analysis. (Stative clauses were parsed but not case analyzed. That's because stative clauses usually represent noun modifier relationships, not cases.) The system could assemble automatically the correct CMP for 69% of the clauses. The user supplied correct CMPs for the remaining clauses. Starting with an empty processing history, the system suggested case patterns for each CMP, depending on previous processing. Over all 439 clauses, the system made on average 4.47 CP suggestions per CMP. After processing all clauses, the maximum number of CP suggestions made for any single clause was fourteen. The increase in this maximum was quick for the first half of

the experiment (from 0 to 11 over the first 200 clauses) but slowed for the second half, where the maximum increased by only 3 over the last 200 clauses. The correct case pattern was among the system’s suggestions for 62% of the 439 clauses—50% for the first 208 clauses and 72% for the next 231 clauses. Figure 1 shows the number of correct case patterns determined automatically by the system over the course of the experiment, contrasted with the number of patterns that had to be supplied by the user. Case assignment was simple for 87% of the clauses, whether the case pattern was suggested by the system or supplied by the user. By simple we mean that for each verb-argument relationship there was a single case that best captured the semantics of the relationship and that case was obvious. Case assignment for the remaining 13% of the clauses was more difficult, requiring some consultation of the definitions of the cases. Also: • The number of CPs suggested to the user increases slowly throughout the analysis of a text before reaching a plateau. • As more clauses are analyzed, the system learns to make better suggestions, meaning that the user is more likely to find the appropriate CP among the suggestions and less likely to be required to supply the CP herself. • The interaction time spent by the user on each sentence throughout the analysis of a text decreases on average. Overall, the user spent just under two minutes per input on average, including syntactic and semantic analysis. • 80% sentences were completely parsed, 85% parsed well enough for HAIKU to proceed. • No new cases were needed to capture the semantics of clauses in the test text; only 4 of 28 cases were not used. This suggests that the case system has good coverage.

• The system made CP suggestions based on past processing. The number of suggestions did not grow unmanageably in time. HAIKU’s suggestions included the best CP in 262 of 429 situations (61%). These numbers include the training phase (starting from an empty dictionary). Note that the system changes as more sentences are processed. The traditional training and testing phases cannot be clearly told apart: they coexist during HAIKU’s processing. • The relationship between sentence length (in tokens) and parse errors was inconclusive. CMP correctness directly related to parse error severity; sentence length and average user time were directly related.

6 Conclusion User participation of the kind employed in our system carries a cost that is justified when a detailed and carefully crafted model of a domain is sought. It should be assumed that the domain is relatively well defined, and can be described by a text of a manageable length. In our experimental evaluations, we have processed approximately 15-20 inputs an hour; this includes numerous inputs with multiple clauses, and accounts for a complete analysis of all levels of semantic relationships. A typical application could be the construction of a knowledge base or a lexicon for the domain. Also, a simple filter on a much larger text could identify passages that require detailed TANKA-style processing. For example, one could concentrate on paragraphs that contain keywords produced by a keyword extractor (such as http://ai.iit.nrc.ca/II_public/extractor.html). Information collected in HAIKU’s dictionaries is derived from the syntactic clues picked up by a surface-syntactic parser. All necessary syntactic and case-related semantic data is collected for profiling the text and for simple frequency analyses. Part of this information allows the user (a knowledge engineer) to semi-automatically extract predicate-argument structures directly from text in a systematic way. The collected data can then be used to build or supplement a lexicon for a specific NLP application or system. Such facts, and other information in HAIKU’s dictionaries, can also be of interest to the linguist who wants to study certain phenomena in real texts. The cpDict could be used as a simple verb classification, for instance, in the sense of (Dixon 91; Levin 93): all verbs that occur with a given CP have something in common. Noun-Modifier Relationship analysis (Barker 97) has been very recently integrated into TANKA. The results of another detailed experimental evaluation have been gathered for a new text (Atkinson 90), much more complicated syntactically and semantically than (Larson 61). A preliminary analysis shows that all the findings of the first experiment have been confirmed (with very similar percentage values), in particular, the learning pattern of case analysis discussed in Section 5 and the associated decrease in the user’s contribution. Details will appear soon in a technical report. We believe such confirmation also supports the validity of the case analysis algorithm and the case system on which we have based our method. Modules of the TANKA system can be

applied in any NLP system based on deep parsing where it is natural to link surface syntax and semantics via cases. Future work includes the enhancing of the reattachment module by allowing only very limited reattachment moves and verifying all resulting structures (the updated parse tree must be legal in DIPETT in order to be processed correctly by HAIKU). We also plan full integration of WordNet categories for nouns. This will help make predicate-argument structures more specific by associating a taxonomic marker with every CM “filler” in the subcategorization frame. These markers will allow more discriminating Case Analysis by making distinctions such as that between agent-animate and agent-artifact. We also have to define a meaningful measure of HAIKU’s learning progress. The traditional precision/recall data do not apply naturally: HAIKU is designed to oscillate continually between training and testing while it progresses through the text. Precision would be high due to the robustness of the parser and the semantic analyzer working together, along with the competence of the cooperating user. We conjecture that recall would also be quite high because of the way in which well-written technical texts are generally constructed. Important concepts, repeated throughout the text, must be caught by the system, despite an occasional parsing failure or a less successful semantic analysis. A more substantive extension—in the spirit of (Briscoe & Carroll 97)—would be the use of the accumulated CMP and CP patterns to improve parsing accuracy. It would be interesting to measure if this eventual gain in parsing accuracy translated into significant gains in the accuracy and coverage of HAIKU’s semantic analysis.

Acknowledgments This work has been supported by the Natural Sciences and Engineering Research Council of Canada. Many thanks to Ken Barker and Terry Copeck for their contribution.

References (Aha et al. 91) D. W. Aha, D. Kibler, M. K. Albert. “Instance-Based Learning Algorithms”, Machine Learning, 6, 37-66.1991. (Atkinson 90) H.F. Atkinson, Mechanics of Small Engines, McGrawHill, 1990. (Barker & Delisle 96) K. Barker, S. Delisle. “Experimental Validation of a Semi-Automatic Text Analyzer”, Technical Report TR-96-01, Dept. of Computer Science, Univ. of Ottawa, 1996. (Barker & Szpakowicz 95) K. Barker, S. Szpakowicz. “Interactive Semantic Analysis of Clause-Level Relationships”, Proc PACLING 1995—Pacific Association for Computational Linguistics (Brisbane, Australia), 22-30, 1995. (Barker 97) K. Barker, “Noun Modifier Relationship Analysis in the TANKA System”, Technical Report TR-97-02, Dept. of Computer Science, Univ. of Ottawa, 1997. (Barker et al. 97) K. Barker, T. Copeck, S. Delisle, S. Szpakowicz. “Systematic Construction of a Versatile Case System”, to appear in Journal of Natural Language Engineering, 1997. (Basili et al. 92) R. Basili, M. T. Pazienza, P. Velardi. “Computational Lexicons: the Neat Examples and the Odd Exemplars”, Proc 3rd Conf on Applied Natural Language Processing (Trento, Italy), 96103, 1992.

(Basili et al. 94) R. Basili, M. T. Pazienza, P. Velardi. “The Noisy Channel and the Braying Donkey”, in J. Klavans and Ph. Resnik (1994), 21-28, 1994. (Brent 91) M. R. Brent. “Automatic Acquisition of Subcategorization Frames from Untaged Text”, Proc 29th Annual Meeting of the Association for Computational Linguistics (Berkeley, California, USA), 209-214, 1991. (Brill 92) E. Brill. “A Simple Rule-Based Part of Speech Tagger”, Proc Third Conf. on Applied Natural Language Processing (Trento, Italy), 152-155, 1992. (Briscoe & Carroll, 97) T. Briscoe, J. Carroll, “Automatic Extraction of Subcategorization from Corpora”, Proc 5th Conf on Applied Natural Language Processing (Washington, DC, USA), 1997. (http: //xxx.lanl.gov/list/cmp-lg/9702002) (Cardie 93) C. Cardie. “A Case-Based Approach to Knowledge Acquisition for Domain-Specific Sentence Analysis”, Proc 11th National Conf on Artificial Intelligence (Washington, DC, USA), 798-803, 1993. (Copeck et al. 97) T. Copeck, K. Barker, S. Delisle, S. Szpakowicz, J.F. Delannoy. "What is technical text?", to appear in Language Sciences, 1997. (Delisle & Szpakowicz 95) S. Delisle, S. Szpakowicz, “Realistic Parsing: Practical Solutions of Difficult Problems”. Proc PACLING 1995—Pacific Association for Computational Linguistics (Brisbane, Australia), 59-68, 1995. (Delisle 94) S. Delisle. “Text Processing without A-Priori Domain Knowledge: Semi-Automatic Linguistic Analysis for Incremental Knowledge Acquisition”, Ph.D. thesis, TR-94-02, Dept. of Computer Science, Univ. of Ottawa, 1994. (Delisle 96) S. Delisle, “Le traitement automatique du langage naturel au service de l’ingénieur de la connaissance : le système READER”, Proc International Conf on Natural Language Processing and Industrial Applications (Moncton, New Brunswick, Canada), Volume I, 60-66, 1996. (Delisle et al. 93) S. Delisle, T. Copeck, S. Szpakowicz, K. Barker. “Pattern Matching for Case Analysis: A Computational Definition of Closeness”, in O. Abou-Rabia, C.K. Chang, W.W. Koczkodaj (eds.), Proc 5th Intl Conf on Computing and Information — ICCI93 (Sudbury, Ontario, Canada), 310-315, 1993.

(Levin 93) B. Levin. English Verb Classes and Alternations (A Preliminary Investigation), Univ. of Chicago Press, 1993. (Liu & Soo 93) R.-L. Liu, V.-W. Soo. “An Empirical Study on Thematic Knowledge Acquisition Based on Syntactic Clues and Heuristics”, Proc 31st Annual Meeting of the ACL (Columbus, Ohio, USA), 243-250, 1993. (Manning 93) C. D. Manning. “Automatic Acquisition of a Large Subcategorization Dictionary from Corpora”, Proc 31st Annual Meeting of the ACL (Columbus, Ohio, USA), 235-242, 1993. (Myaeng et al. 94) S. H. Myaeng, C. Khoo, M. Li. “Linguistic Processing of Text for a Large-Scale Conceptual Information Retrieval System”, Proc 2nd International Conf on Conceptual Structures (Maryland, USA), in W. M. Tepfenhart, J. P. Dick and J. F. Sowa (eds.), Lecture Notes in AI #835, Springer-Verlag, 69-83, 1994. (Ogonowski et al. 94) A. Ogonowski, M. L. Herviou, E. Dauphin. “Tools for Extracting and Structuring Knowledge from Texts”, Proc COLING-94 (Kyoto, Japan), 1049-1053, 1994. (Poznanski & Sanfilippo 96) V. Poznanski, A. Sanfilippo. “Detecting Dependencies between Semantic Verb Subclasses and Subcategorization Frames in Text Corpora”, in B. Boguraev and J. Pustejovsky (eds.), Corpus Processing for Lexical Acquisition, MIT Press, 175-190, 1996. (Pugeault et al. 94) F. Pugeault, P. Saint-Dizier, M.-G. Monteil. “Knowledge Extraction from Texts: A Method for Extracting Predicate-argument Structures from Texts”, Proc COLING-94 (Kyoto, Japan), 1039-1043, 1994. (Quirk et al. 85) R. Quirk, S. Greenbaum, G. Leech, J. Svartvik. A Comprehensive Grammar of the English Language, Longman, 1985. (Ribas 95) F. Ribas, “On Learning more Appropriate Selectional Restrictions”. Proc Seventh Conf of the European Chapter of the ACL—EACL-95 (Dublin, Ireland), 112-118, 1995. (Riloff 93) E. Riloff. “Automatically Constructing a Dictionary for Information Extraction Tasks”, Proc 11th National Conf on Artificial Intelligence (Washington, DC, USA), 811-816, 1993. (Saint-Dizier & Viegas 95) P. Saint-Dizier, E. Viegas (eds.). Computational Lexical Semantics, Cambridge University Press, 1995.

(Delisle et al. 96) S. Delisle, K. Barker, T. Copeck, S. Szpakowicz. “Interactive Semantic Analysis of Technical Texts: Case Pattern Acquisition”, Computational Intelligence, 12 (2), 273-306, 1996.

(Sanfilippo 94) A. Sanfilippo. “Word Knowledge Acquisition, Lexicon Construction and Dictionary Compilation”, Proc COLING-94 (Kyoto, Japan), 273-277, 1994.

(Dixon 91) R. M. W. Dixon. A New Approach to English Grammar, On Semantic Principles, Oxford Univ. Press, 1991.

(Sekine et al. 92) S. Sekine, J. J. Carroll, S. Ananiadou, J. Tsujii. “Automatic Learning for Semantic Collocation”, Proc 3rd Conf on Applied Natural Language Processing (Trento, Italy), 104-110, 1992.

(Framis 94) F. R. Framis. “An Experiment on Learning Appropriate Selectional Restrictions from a Parsed Corpus”, Proc COLING-94 (Kyoto, Japan), 769-774, 1994. (Gomez et al. 94) F. Gomez, R. Hull, C. Segami. “Acquiring Knowledge from Encyclopedic Texts”, Proc 4th Conf on Applied Natural Language Processing (Stuttgart, Germany), 84-90, 1994. (Grishman & Sterling 94) R. Grishman, J. Sterling. “Generalizing Automatically Generated Selectional Patterns”, Proc COLING-94 (Kyoto, Japan), 742-747, 1994. (Grishman et al. 94) R. Grishman, C. Macleod, A. Meyers. “Complex Syntax: Building a Computational Lexicon”, Proc COLING-94 (Kyoto, Japan), 268-272, 1994. (Guthrie et al. 96) L. Guthrie, J. Pustejovsky, Y. Wilks, B. M. Slator. “The Role of Lexicons in Natural Language Processing”, Comm ACM, 39(1)—special section on NLP, 63-72, 1996. (Hastings & Lytinen 94) P.M.Hastings, S.L.Lytinen. “The Ups and Downs of Lexical Acquisition”, Proc AAAI-94 (Seattle, Washington, USA), 754-759, 1994. (Jensen 91) K. Jensen, “A Broad-Coverage Natural Language Analysis System”, in M. Tomita (ed.), Current Issues in Parsing Technology, Kluwer, 1991. (Klavans & Resnik 94) J. Klavans, Ph. Resnik (eds.). Proc ACL Workshop “The Balancing Act: Combining Symbolic and Statistical Approaches to Language” (Las Cruces, New Mexico, USA), 1994. (Larrick 61) N. Larrick. Junior Science Book of Rain, Hail, Sleet and Snow. Champaign: Garrard Publishing Co, 1961.

(Wilks et al. 96) Y. A. Wilks, B. M. Slator, L. M. Guthrie. Electric Words (Dictionaries, Computers, and Meanings), MIT Press, 1996.

Suggest Documents