quasi-logical form to case frame mapping for english to ... - CiteSeerX

1 downloads 0 Views 358KB Size Report
Jul 24, 1996 - I would also like to thank SRI International and especially Steve ...... 1] Hiyan Alshawi, David Carter, Richard Crouch and Steve Pulman, Manny.
QUASI-LOGICAL FORM TO CASE FRAME MAPPING FOR ENGLISH TO TURKISH MACHINE TRANSLATION

A THESIS SUBMITTED TO THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES OF THE MIDDLE EAST TECHNICAL UNIVERSITY BY BAHADIR PEHLI_VANTU RK

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN THE DEPARTMENT OF COMPUTER ENGINEERING

DECEMBER 1996

Approval of the Graduate School of Natural and Applied Sciences.

 urk Prof. Dr. Tayfur Ozt Director I certify that this thesis satis es all the requirements as a thesis for the degree of Master of Science.

Prof. Dr. Fatos Yarman Vural Head of Department This is to certify that we have read this thesis and that in our opinion it is fully adequate, in scope and quality, as a thesis for the degree of Master of Science.

Assist. Prof. Dr. Cem Bozsahin Supervisor Examining Committee Members

Assoc. Prof. Dr. Mehmet Tolun Assoc. Prof. Dr. Deniz Zeyrek Asst. Prof. Dr. Volkan Atalay Asst. Prof. Dr. Cem Bozsahin Asst. Prof. Dr. Halit Oguztuzun

ABSTRACT QUASI-LOGICAL FORM TO CASE FRAME MAPPING FOR ENGLISH TO TURKISH MACHINE TRANSLATION Pehl_vanturk, Bahadr MS., Department of Computer Engineering Supervisor: Assist. Prof. Dr. Cem Bozsahin December 1996, 70 pages This study is designed to be a part of the English-to-Turkish Translation component of a Machine Translation System. It converts the output of an English parser to a structural representation called Case Frames (CF). The basic approach to MT utilizes three modules: source language analysis, transfer, and target language generation. The parser is a system called Core Language Engine (CLE) which produces Quasi-Logical Forms (QLF) of English sentences. However the transfer phase of the project uses Case Frame representations. This study produces the module which maps QLFs to English CFs to be used in the transfer phase. The mapping rules use uni cation to produce right CFs and the system runs on Sicstus Prolog. The mapping of each QLF construct is determined by its grammatical and pragmatic functions, and lexical properties. The system has been tested with the prototype generation module that produces surface forms. Keywords: Machine Translation, Natural Language Processing

iii

O Z I_NGI_LI_ZCE'DEN TU RKCE'YE CEVI_RI_DE MANTIK BENZERI_   GOSTER I_MDEN DURUM GOSTER I_MI_NE CEVI_RI_ Pehl_vanturk, Bahadr Yuksek Lisans, Bilgisayar Muhendisligi Bolumu Tez Yoneticisi: Yrd. Doc. Dr. Cem Bozsahin Aralk 1996, 70 sayfa Bu calsma bir Bilgisayar ile Ceviri Sistemi'nin I_ngilizce'den Turkce'ye ceviri ksmnn bir parcas olarak tasarlanmstr. Bir I_ngilizce ayrstrcnn cktsn Durum Gosterimi ad verilen yapya donusturmektedir. Temel bilgisayar ile ceviri yaklasm uc modul kullanmaktadr: kaynak dil cozumlemesi, ceviri ve hedef dilin yaratls. Bu ayrstrc CLE ad verilen bir sistem olup QLF ad verilen sembolik yapy olusturmaktadr. Ancak bu projenin ceviri ksm Durum Gosterimi'ni kullanmaktadr. Bu calsma, ceviri ksmnda kullanlmak uzere, QLF den I_ngilizce Durum Gosterimi'ne donusumu yapan modulu uretmektedir. Ceviri kurallar dogru Durum Gosterimi'ni uretmek icin birlestirimi kullanmakta ve sistem Sicstus Prolog uzerinde calsmaktadr. Her QLF yapsnn cevrimi, o yapnn dilbilgisine, kullanmbilim islevlerine ve sozdizimsel ozelliklerine gore yaplr. Sistem, yuzey bicimlerini ureten ontip yaratm modulu ile denenmistir. Anahtar Kelimeler: Bilgisayar ile Ceviri, Dogal Dil I_sleme

iv

In dedication to my father

v

ACKNOWLEDGMENTS First of all, I would like to thank Cem Bozsahin and Cigdem Keyder (Turhan) for their contributions with corrections and discussions. I would also like to thank SRI International and especially Steve Pulman for providing nearly all of the material used to prepare chapter 3 and the rewrite rule interpreter he wrote -which eased my task considerably, and to David Milward for answering my questions on CLE and QLFs, which occured frequently during this study, and for personally teaching me how to modify the lexicon of the CLE system. Thanks are also due to NATO TU-LANGUAGE and TU BI_TAK EEEAG90 projects for providing the development environment and research materials. Hardware and software resources of the Laboratory for the Computational Studies of Language (LcsL) have been used in all stages of the preparation of this thesis. And last but certainly not the least, I would like to thank all of the friends and family for their encouragement, support, friendship and love during the preparation of this thesis.

vi

TABLE OF CONTENTS ABSTRACT  OZ

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

iii

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

iv

DEDICATON

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

v

ACKNOWLEDGMENTS

: : : : : : : : : : : : : : : : : : : : : : : : : : :

vi

TABLE OF CONTENTS

: : : : : : : : : : : : : : : : : : : : : : : : : : :

vii

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

ix

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

x

LIST OF TABLES LIST OF FIGURES CHAPTER 1 2

INTRODUCTION THE CORE LANGUAGE ENGINE AND THE QUASI-LOGICAL FORMS 2.1 The Core Language Engine 2.2 Overview of the CLE Components 2.3 Quasi-Logical Forms 2.3.1 Syntax of the QLF Language 2.3.1.1 terms and forms 2.3.1.2 form Resolutions 2.3.1.3 term Resolutions 2.3.2 Example QLFs QUASI-LOGICAL FORM TO ENGLISH CASE FRAME TRANSFER 3.1 The form Construct 3.2 The term Construct 3.3 The Structure of the QLF

: : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

3

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

vii

1 6 6 8 10 11 12 12 12 13 18 19 22 25

4

CONCLUSION

REFERENCES

: : : : : : : : : : : : : : : : : : : : : : : : : : : :

34

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

36

APPENDICES A MODIFYING THE CLE A.1 Running the System A.2 Modifying the QLF-to-CF Module A.3 Making New Lexical Entries to the CLE System B QLF TO CF MAPPING RULES C REWRITE RULE INTERPRETER D TOP LEVEL COMMANDS FOR PERFORMING ENGLISH TO ENGLISH CASE FRAME MAPPING E EXAMPLE OUTPUTS OF THE QLF-TO-CF MODULE F TURKISH AND ENGLISH CASE FRAME STRUCTURES F.1 Turkish Case Frame Structure F.2 English Case Frame Structure

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : :

viii

38 38 38 39 40 41 58 61 62 69 69 70

LIST OF TABLES TABLE 3.1 Verb Attributes in CLE 3.2 Noun categories in CLE 3.3 Sentence Type Indicators in CLE

: : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : :

ix

: : : : : : : : : : : : : : : : : :

20 23 26

LIST OF FIGURES FIGURES 1.1 1.2 2.1 2.2

Machine Translation Methods MT Component of the TU-LANGUAGE Project Broad overview of the CLE architecture Inputs to CLE processing phases in analysis direction

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : :

x

: : : : : : :

3 4 7 9

CHAPTER 1 INTRODUCTION This study is part of the Turkish Natural Language Processing (TU-LANGUAGE) project. More speci cally, it is designed to be a part of the English-to-Turkish Translation component of the Machine Translation System. It is the subcomponent that converts the output of the English parser to a structural representation called Case Frames. The TU-LANGUAGE project is a 5-year research and development e ort for natural language processing (NLP) in Turkish. It aims to establish the necessary computational foundations for natural language processing in Turkish. It also aims to develop a number of sophisticated natural language processing applications for performing various tasks built upon these foundations. The project is being conducted by a team of researchers from the Middle East Technical University and Bilkent University. There are also participants from various domestic and international computer companies [6]. The project mainly emphasizes the development of reusable and maintainable language processing modules that can be used in the future in other natural language processing applications. One of these applications which is included within the project is that of developing a Human-assisted Machine Translation System to translate texts from English to Turkish within highly constrained domains. For this task, a sublanguage is determined to limit the domain of discourse and lexicon, though this does not limit the grammatical coverage. The initial target is to translate English IBM computer manuals into Turkish. The general task of Machine Translation can be described as : 1

Feed a text in one language (SL, for source language) into a computer and, using a computer program, produce a text in another language (TL, for target language), such that the meaning of the TL text is the same as the meaning of the SL text [12]. Research in Machine Translation started in late 40's. At the dawn of the MT age Machine translation seemed to be a very attractive and feasible application of computer technology. The rst scienti c conference on MT was held in 1952 at MIT and the rst public demonstration of a translation program took place in 1954 at Georgetown University. This program could translate about 50 Russian sentences selected from texts on chemistry into English and included about 250 words. Through 1950's and into the following decade, research in MT continued and grew. The requirements of MT gave way to signi cant theoretical developments in linguistics which later became known as the disciplines of computational linguistics and arti cial intelligence. However improving the Georgetown experiment proved to be very dicult as translation quality declined with expanded coverage. The early MT projects, indeed, failed to reach their goal of building systems of fully automated high-quality machine translation. This has retarded the MT research for about 15 years. The revival of MT as a scienti c discipline and an application of linguistic and computer technology starts with the Eurotra project and the MT e orts in Japan. Eurotra, begun in 1978, was a project aimed at providing MT capability among ocial EEC languages. Japanese MT e orts started around 1980, most notably with the Mu project at Kyoto University [12]. The rst MT e ort in Turkey was in Middle East Technical University (METU) by Zeki Sagay [14]. However the most extensive e ort has been the TU-LANGUAGE Project. In most of the Machine Translation (MT) literature, approaches to MT are divided into three mainstreams: the direct approach, the transfer approach, and the interlingua approach. These are in increasing order of complexity and extendibility [5]. Figure 1.1 shows these three approaches. In direct translation systems, languages are translated by replacing source language words with target language words. Such a translation system is appropriate for applications where the text has a limited vocabulary and a well de ned style in terms of grammar and semantics. Most of the early translation 2

Interlingua Source Language Representation

Target Language Representation

Transfer Analysis

Generation

Direct Translation Target Language

Source Language

Figure 1.1: Machine Translation Methods systems used this approach. For instance US military and intelligence organizations used such systems so that non-Russian speakers could judge developments in the Eastern Bloc. By this way a large survey of the Russian documentation became possible. The two most well-known examples of direct translation type systems are SYSTRAN and SPANAM [8]. The transfer approach separates the translation process into three general stages of analysis, transfer and synthesis. These systems analyze the input sentence and then apply syntactic and lexical rules, which are called transfer rules, to map grammatical information from one language to another. To identify the structure of the input sentence, transfer systems use parsers. Most prominent examples are TAUM-METEO [9], METAL [4] and EUROTRA [10]. Interlingual systems translate the text using an underlying data-representation called interlingua [7]. Interlingual systems make use of a representation system which is in a sense "deeper" than the surface grammatical information. For instance, they would encode a grammatical subject as agent, experiencer or patient (in passives). A midway between pure transfer and pure interlingua is the structural interlingua, where the underlying representation is \closer" to the surface representation than interlingual systems, and the mapping (transfer) is de ned from grammatical information in one language to that of another. For instance, a structural mapping might convert an English prepositional phase (PP) to a 3

case-marked Noun-phrase (NP) in Turkish, rather than setting a correspondence between the thematic roles denoted by the phrases [5]. Two examples of interlingual systems are TRANSLATOR and ULTRA. The basic approach to MT utilizes three modules: source language analysis, transfer, and target language generation. The source language analysis module maps the source language text into a form which is then processed by the transfer module utilizing lexical and structural transfer rules. The resulting structure is then processed by the target language generation module. The schema for the MT part of the TU-LANGUAGE project is depicted in Figure 1.2.

Text

Language Analyzer

(English)

(CLE)

Analysis

Source QLF to Source

Source CF to Target

CF Mapping

CF Mapping

Generator (Turkish)

Mapping (transfer)

Generation

Filters

Source

Source

Target

Target

Language

Text

Figure 1.2: MT Component of the TU-LANGUAGE Project This project can be viewed in three parts; the source language analysis, mapping, and target language generation. The rst three modules in Figure 1.2 deal with the analysis part. The second module shows the source language analysis and parsing part which is being done by SRI International. The parser is a system called Core Language Engine (CLE) which produces Quasi-Logical Forms (QLF) of English sentences. This parser is a comprehensive English parser which is capable of analyzing the kinds of sentences in the IBM corpus. This includes bracketing of special names, recognizing the special forms in the corpus, and marking topic/focus information which is critical for Turkish word-order variations. The third module shows the tool designed as part of this thesis. This is the interface between the output of CLE and the transfer phase, which is going to use Case Frame (CF) structures. This module is necessary for the extensibility of the 4

system for future development. The Case Frame representation is independent of the CLE's logical form (QLF). This makes it possible for the system to be used for another source-language. Once the output of the analysis phase of a new source-language (e.g. Japanese) is transfered into Case Frame structures, the system can generate the corresponding Turkish sentences from these Case Frames. The fourth module shows the mapping phase in which English CFs are mapped into Turkish CFs. In the generation part, the Turkish sentence is generated from the Turkish CFs produced in the mapping part. The CLE system of SRI is brie y described in Chapter 2. Chapter 3 explains the QLF and CF structures which are going to be the two ends of the transfer module. It also describes in detail how the transfer module maps QLF expressions to CFs. And in Appendix A information on how to make modi cations on the QLF-to-CF module and how to make new lexical entries in CLE can be found.

5

CHAPTER 2 THE CORE LANGUAGE ENGINE AND THE QUASI-LOGICAL FORMS 2.1 The Core Language Engine The Core Language Engine (CLE) [3] is a general-purpose system for mapping between natural language sentences and logical form representations of their meaning. CLE's primary goal was to achieve a linguistically well motivated, substantial syntactic and semantic coverage of English while at the same time being as independent as possible of particular domains of discourse. CLE has adopted a modular design. In this design, explicit intermediate levels of linguistic representation are used as an interface between successive phases of analysis. The nal result is a set of fully speci ed logical forms (LF) representing possible literal meanings of the input sentence. Besides the usual scienti c and engineering bene ts of modularity, this approach makes it possible for CLE to be used in applications for which an intermediate level of linguistic representation is more suitable. Two such levels of intermediate linguistic representation are the parse trees produced by the syntactic analysis phase and quasi logical forms (QLFs) produced by the initial semantic analysis phase. Examples of QLF can be found in Appendix E. In the CLE architecture, the QLF representation, which may be thought of informally as a "contextually sensitive" logical form, became central to the overall design as shown schematically in Figure 2.1 [3]. In this interactive 6

machine translation project QLFs are being used. sentence

syntactic & semantic analysis/synthesis

lexical entries

QLF

rules

logical form transformations

lexical acquisition

context

LF

application

Figure 2.1: Broad overview of the CLE architecture The QLF representation is independent of the in uence of context. It results from purely linguistic processing by the retrieval of lexical entries, syntactic and semantic rules. The di erence between QLFs and fully speci ed logical forms is that, QLFs may contain quanti ers and operators whose scope has not yet been determined, and also "anaphoric expressions" that stand for entities and relations to be determined by reference resolution. For instance pronouns are anaphoric terms and what entity a speci c pronoun refers to is resolved after the creation of the QLF. The basic mechanism for passing information during linguistic analysis and generation, and, to a lesser extent, during interpretation process in the CLE design is uni cation 1. Uni cation is an operation that supports the incremental Let S be a set of expressions. When a substitution  transforms every expression in S into the same expression,  is said to unify S (or to be a uni er of S) and the set S is said to be uni able [13]. 1

7

solution of systems of constraints as new constraints are added. During the analysis of a sentence, uni cation is used to ensure that the constraints associated with its constituent phrases are compatible as speci ed by the rules of grammar.

2.2 Overview of the CLE Components The following constructions are used to cover English syntax, semantics, referring expressions, and ellipsis 2 in the CLE [3]: Major clause types: declaratives, imperatives, wh- and yes-no questions, relatives, passives, clefts, there-clauses. Verb phrases: complement subcategorization, control verbs, verb particles, auxiliaries, tense operators, some adverbials. Noun phrases: prenominal and postnominal modi ers, lexical and phrasal quanti ers/speci ers. Coordination: conjunctions and disjunctions of a wide class of noun phrases, verb phrases, and clauses; adjectival, nominal, and adverbial comparatives. Anaphoric expressions: de nite descriptions, re exive and nonre exive pronouns, bound variable anaphora, implicit relations. Ellipsis: `one'-Anaphora, intrasentential and intersentential verb phrase ellipsis, follow-on questions. Morphology: in ectional morphology, simple productive cases of derivational morphology, special form tokens.

CLE handles linguistic analysis in four processing phases: lexical analysis (segmentation), morphology, syntactic analysis, and semantic analysis. Further disambiguation and contextual interpretation is carried out by phases for sortal ltering (which makes a sorting of di erent parses produced by ambiguious sentences), quanti er scoping (which resolves the scoping of quanti ers like every, all as in `all patients in every room'), reference (and ellipsis) resolution (when Ellipses are recoverable segment fragments like \on the table", \Will you?" or \I saw Mary before you did." and etc. Ellipsis is natural in natural language discourse. 2

8

there is a sentence which is gramatically incomplete, this phase nds out what it refers to), and plausibility checking (this phase applies linguistic and domain based constraints in the form of sortal restrictions). Inputs to the CLE analysis and interpretation phases are shown in Figure 2.2 [3]. In the lexical analysis phase words are segmented into stems and axes, and other token-related tasks such as spelling correction and the recognition of open-ended tokens like dates and numbers. sentence

SEGMENTATION

segmentation rules

segmentations

MORPHOLOGICAL ANALYSIS

morphological rules stem categories stem senses

word analyses

SYNTACTIC ANALYSIS

syntax rules

packed parse trees semantic

SEMANTIC ANALYSIS

interpretation rules

packed QLFs sortal restrictions preference metrics

SORTAL FILTERING well-sorted QLFs

QUANTIFIER SCOPING

scoping rules

scoped QLFs

REFERENCE RESOLUTION

resolution rules saliance weights application context

resolved QLFs

PLAUSIBILITY JUDGEMENT

sortal restrictions linguistic constraints application context

single logical form

Figure 2.2: Inputs to CLE processing phases in analysis direction Syntactic analyses are produced by bottom-up parsing controlled in a topdown way. That is, syntactic analysis is built bottom-up by the parser, but before basing any further analysis on it, it checks every proposed constituent to make sure that it is compatible with the analysis of the preceding part of the sentence. 9

The CLE performs the semantic analysis phase as follows: The system traces the syntactic analyses down from the start symbol to look up the word senses of all the lexical items and apply the semantic rules to all the complex constituents that appear in complete sentence analysis. It then puts into a database \semantic constituent" and \semantic analysis" records which encode a packed representation of the QLFs for the sentence. To process a constituent, the system nds a syntactic analysis for the constituent in the database and recursively computes a semantic analysis for all the daughter constituents in that analysis. It then looks for a semantic rule that corresponds to the syntax rule for the analysis and which is compatible with the semantic analyses of the daughter constituents. If there is such a semantic rule, the mother category and its QLF are extracted from the rule and added to the database as a semantic constituent record and a semantic analysis record.

2.3 Quasi-Logical Forms This section reviews QLFs as described in the technical report by Alshawi, Carter, Ryner [2]. QLF is a contextually sensitive logical form representation. Because QLFs are far enough removed from surface linguistic form they provide the exibility required by cross-linguistic di erences. Besides this, without the need to reason about the domain or context, the uni cation-based processing used in creating them can be carried out eciently. At the same time it has constructs for explicit representation of contextually sensitive aspects of interpretation. The CLE generates representations corresponding to successive phases of linguistic analysis while processing a sentence. These representations are: orthographic analysis, morphological analysis, syntactic analysis, unscoped quasilogical forms, scoped quasi- logical forms, resolved quasi-logical forms and logical forms. In this project, in order to preserve the ambiguity present in most English sentences, we chose to make the Case Frame conversion at the unscoped QLF level. 10

2.3.1 Syntax of the QLF Language The QLF language has the following BNF. BNF rules for QLF slightly di er in various applications and in RQLFs (reduced QLFs) and QLFs.

?!





form ( , , , ) ,;arg>+

?!

j[

term ( , , , , , )

j

j



"



?!



"



?!



?!



?!



?!

[apply,

?!

v()











?!





j

j



"

,



]





j



[negative:RESTo], where [REST ==> RESTo].

However when this rule is used, the negative/positive information and the information on the verb is represented on di erent levels in the resulting case frame. This is largely a stylistic choice and representation at the same level has been chosen. 21

The CLE also uses the form construct in cases not related to verbs. One example is the Prepositional Phrases. pp :: form(_WList, prep(Prep), _, B^[B,Var,ARG2], _) ==> [pp:[pform:Prep, arguments:[arg1=Var,ARG2o]]] where [ARG2 ==> ARG2o].

Another example is the case of genitives and possessors. In these cases the information is given explicitly in the second argument of the form construct. It is `poss' in posessors and `genit' in genitives. The rules concerning genitives and possessors can be found in Appendix B. Other cases are ellipsis, noun conjunctions, path and compound nouns.

3.2 The term Construct Terms represent the meaning of noun phrases. They have a format resembling forms: 1. List of words as in form. (i.e. l ([. . .])) 2. A category. These are shown in Figure 3.2. The argument tpc means that the NP is a topic, i.e., it is either in subject position or fronted. It is there for quanti er scoping purposes, and that position in the category will be uninstantiated. 3. An index used for reference resolution purposes. The indices of NPs are threaded throughout a sentence so that intra-sentential anaphora resolution can use them. 4. A restriction for the meaning of the N constitutent. 5. A metavariable for a referent after resolution. These can have various forms re ecting di erent interpretation possibilities. 22

Table 3.2: Noun categories in CLE For referential NPs:

ref (pro, , sing/plur, ) ref (def, , sing/plur, ) ref (re , , sing/plur, ) ref (pass agent)

For quanti ers and inde nites:

q(tpc, , sing/plur/mass)

for pronouns for the for re exive pronouns for passive agents

for quanti ed NP

For names:

proper name(tpc) for proper names

For times:

time(TIME)

For order nouns: ord()

An example is pronouns. For pronouns the rule is pronouns ::

term(l([Lexeme]),

% i.e.a real np

ref(pro,Lexemeo,NUMBER,_Ant), _Idx, _Restriction, _Qnt, np:[focus=Focus,mass=Mass])

=) [class:pronoun, number:NUMBERo, root:Lexeme, mass:Mass, focus:Focus]

where

[NUMBER =) NUMBERo]. 23

Here, in order to unify, the term construct has to contain the predicate ref(...) and the rst atom in this predicate has to be pro. Similarly the term construct for the proper name uni es when the rst atom of the ref predicate is refl : reflexive_pronouns :: term(l([Lexeme]), ref(refl,Lexemeo,NUMBER,_Ant), _Idx, _Restriction, _Qnt, np:[focus=Focus,mass=Mass])

=) [class:reflexive_pronoun, number:NUMBERo, root:Lexeme, mass:Mass, focus:Focus]

where

[NUMBER =) NUMBERo]. Here is an example showing both, for the sentence 'She loves herself': [dcl, form(l([she,loves,herself]),verb(pres,no,no,no,y),A, B^ [B, [love_Like,A, term(l([she]),ref(pro,she,sing,l([])),C, D^[sex_GenderOf,Female,D],_, np:[focus=subj,mass=n]), term(l([herself]),ref(refl,she,sing,l([C-sing])),_, E^[sex_GenderOf,Female,E],_, np:[focus=not_tpc,mass=n])]], s:[inv=n,whmoved=n])]

=) 24

2

3

declarative 7 7 active 7 7 3 2 6 7 6 7 root:love Like 7 6 7 6 6 7 7 6tense:pres 6 7 7 6 6 7 7 6 6 7 7 6 perfect:no 6 7 7 6 6 7 7 6 6verb: 7 6progressive:no 7 6 7 7 6 6 7 6negation:positive 7 6 7 7 6 6 7 7 6 6 7 7 6 negation:positive 6 7 5 4 6 7 6 7 modality:no 6 7 6 2 37 2 3 6 7 6 7 class:pronoun 6 7 6 7 6 7 6 6 77 6number:singular7 6 7 6 7 6 7 6 6 77 6 7 6 7 6 7 7 6 6arg1: 6 77 6root:she 7 6 7 6 7 6 7 6 6 77 6mass:n 7 6 7 6 7 4 5 6 6 77 6 7 6 7 focus:subj 6 77 6arguments: 6 7 2 3 6 7 6 6 77 6 7 class:re exive pronoun 6 7 6 6 777 6 6 7 6 7 6 7 6 6number:singular 777 6 6 7 6 7 6 7 6 777 6arg2: 6 root:herself 6 7 6 7 6 7 6 6 777 6 6 7 6 7 6 7 mass:n 6 4 557 4 4 5 focus:not tpc mood: 6 6voice: 6 6

Other nouns also have rules very similar to the ones above. In most of the cases the only di erence is the second argument of the term construct as shown in table 3.2. Besides pronouns, proper names and re exive pronouns, these are demonstrative noun phrases, de nite noun phrases, quanti ed noun phrases, compound nouns, time noun phrases and ordered nouns.

3.3 The Structure of the QLF As mentioned above some of the grammatical information in QLFs are conveyed via the structure of the QLFs. one example is the sentence type indicators. QLF formulae of sentences which have been parsed as complete sentences start with a sentence type indicator. For instance a sentence starting with a dcl means that a declarative meaning is assigned to a sentence. Table 3.3 shows the sentence type indicators in QLF. Only strings which have been parsed as complete sentences get these indicators. Some sentences containing verbs which take sentential complements may produce QLFs with another sentence type indicator around the complement. 25

dcl ynq whq imp

for for for for

Table 3.3: Sentence Type Indicators in CLE declarative sentence yes/no question wh- question imperative sentence

These type indicators do not necessarily correspond directly to the speech; they are based solely on syntactic form. In the QLF to CF module the rst four rules deal with the top level `speech act' (sentence mood) functors: dcl :: [dcl,REST] =) [mood:declarative j RESTo] where [REST =) RESTo]. imp :: [imp,REST] =) [mood:imperative j RESTo] where [REST =) RESTo]. ynq :: [ynq,REST] =) [mood:interrogative j RESTo] where [REST =) RESTo]. whq int :: [whq,REST] =) [mood:interrogative j RESTo] where [REST =) RESTo]. The `o' sux at the right hand side of the mapping (i.e. RESTo) means that the rest is yet to be converted to the corresponding mapping. For example the QLF for the sentence 'John is sleeping' 1. [dcl, 2. form(l([John,is,sleeping]), 3. verb(pres,no,yes,no,y),A, 4. B^ 5. [B, 6. [sleep_BeNaturallyUnconscious,A, 7. term(l([John]),proper_name(tpc),_,C^[name_of,C,John],_, 8. np:[focus=subj,mass=_])]], 9. s:[inv=n,whmoved=n])]

is converted to the CF: 26

2

3

declarative 7 7 active 7 3 7 2 6 6 BeNaturallyUnconscious 7 777 6 6root:sleep 6 7 7 6tense:pres 6 7 7 6 6 7 7 6 6 7 7 6perfect:no 6 7 7 6 6verb: 7 7 6 6 7 7 6progressive:yes 6 7 7 6 6 7 7 6 6 7 7 6negation:positive 6 5 7 4 6 7 modality:no 6 7 6 7 3 2 6 37 2 6 class:proper name pronoun77777 6 6 6 6 6 6 7 77 6topic:tpc 6 6 77 6 77 6arguments: 6 7 7 arg1: 6 6 6 7 77 6root:John mass: 7 6 6 5 4 57 4 4 5 focus:subj mood: 6 6voice: 6 6

Similarly, the sentence 'who is sleeping' produces a QLF and a CF with a similar structure as the above sentence with the the dcl changed to whq, and mood:declarative changed to mood:interrogative. 1. [whq, 2. form(l([who,is,sleeping]),verb(pres,no,yes,no,y),A, 3. B^ 4. [B, 5. [sleep_BeNaturallyUnconscious,A, 6. term(l([who]),q(tpc,wh,_),_,C^[personal,C],_, 7. np:[focus=subj,mass=n])]], 8. s:[inv=n,whmoved=n])]

=)

2

3

interrogative 7 7 :active 7 37 2 6 6 root:sleep BeNaturallyUnconscious 7777 6 6 77 6tense:pres 6 77 6 6 77 6 6 77 6perfect:no 6 77 6 6 77 6 6verb: 77 6progressive:yes 6 77 6 6 77 6 6 77 6negation:positive 6 57 6 4 6 7 modality:no 6 7 6 7 2 3 6 7 2 3 6 7 class:common 6 7 6 7 6 7 6 7 6 7 6 7 6 7 number: 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 determiner:wh 6 7 6 7 6 7 6 7 6 7 6 7 7 6arguments: 6 7 mass:n arg1: 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 focus:subj 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 7 6 var: 7 6 7 4 5 4 5 4 5 nominal:personal mood: 6 6voice 6 6

27

As an example for a sentence containing a verb with a sentential complement: 'I wonder what time it is' gives the QLF [dcl, form(l[I,wonder,what,time,it,is],verb(pres,no,no,no,y),A, . . . [whq, form(l([what,time,it,is]),verb(pres,no,no,no,y),E, . . .

and a case frame: 2

mood: 6 6voice: 6 6 6verb: 6

6 6 6 6 6 6 6 6 6 6arguments: 6 6 6 6 6 4

declarative active



2

subj:

6 6 6 6 6 6 6 6 6obj 6 6 6 4

:



2

mood:

6 6voice: 6 6 6verb: 6 6 6 6 4arguments:

3 7 7 7 7 7 37 7 7 377 7 interrogative7777777 777 active 777 777 777 7 2 37 777 7 77 subj: 77 4 57 557 5



 obj :   

Another example of gathering information from the structure of the QLF is the thematic role information. Below is the format of the rewriting rules for transitivity and thematic role information. Predicate itself is ignored; it has already been transferred at the form level. For the verb `be' another rule has been written and it also ignores the predicate for the same reason. The thematic role information is extracted by taking the order of the arguments into account. type_Print :: [type_Print,_Idx,ARG1,ARG2]

=) [agent:ARG1o, patient:ARG2o]

28

where [ARG1 ==> ARG1o, ARG2 ==> ARG2o].

and these are the rules for verb transitivity: intransitives :: where

transitives :: where

ditransitives ::

where

[VERB, Idx,ARG1]

=)

[subj:ARG1o]

[is a qlf constant(VERB), ARG1 ==> ARG1o]. [VERB, Idx,ARG1,ARG2]

=)

[subj:ARG1o, obj:ARG2o]

[is a qlf constant(VERB), ARG1 ARG1o, ARG2 ARG2o].

=) =)

[VERB, Idx,ARG1,ARG2,ARG3] =)

[subj:ARG1o, obj:ARG2o, id obj:ARG3o]

[is a qlf constant(VERB), ARG1 ARG1o, ARG2 ARG2o, ARG3 ARG3o].

=) =) =)

For example the sentence 'John gave Mary the book': [dcl, form(l([John,gave,Mary,the,book]),verb(past,no,no,no,y),A, B^ [B, [give_EndowWith,A, term(l([John]),proper_name(tpc),C,D^[name_of,D,John],_, np:[focus=subj,mass=_]), term(l([the,book]),ref(def,the,sing,l([E-sing,C-sing])), _,F^[book_BoundPaper,F],_, np:[focus=not_tpc,mass=n]), term(l([Mary]),proper_name(not_tpc),E,G^[name_of,G,Mary],_, np:[focus=not_tpc,mass=_])]], s:[inv=n,whmoved=n])]

=) 29

2

3

declarative 7 7 active 7 3 2 6 7 6 7 EndowWith root:give 6 7 7 6 6 7 7 6tense:past, 6 7 7 6 6 7 7 6 6 7 7 6perfect:no 6 7 7 6 6verb: 7 7 6 6 7 7 6progressive:no 6 7 7 6 6 7 7 6 6 7 7 6negation:positive 6 7 5 4 6 7 6 7 modality:no 6 7 6 2 37 2 3 6 7 6 7 class:proper name 6 6 77 6 7 6 6 77 6topic:tpc 7 6 6 77 6 7 6 6 77 6 7 6 6subj: 77 6 7 6 root:John 6 77 6 7 6 6 77 6 7 6 6 77 6mass: 7 6 6 77 4 5 6 6 77 6 6 77 focus:subj 6 6 77 6 6 377 2 6 6 77 6 class:common 6 77 6 777 6 6 6 7 6 6 77 6 777 6 6number:singular 6 7 6 6 77 6 777 6 6determiner:the 6 7 6 6 77 6 777 6 6arguments: 6 mass:n 7 6 6 obj: 77 6 777 6 6 6 7 6 6 77 6 777 6 6focus:not tpc 6 7 6 6 77 6 777 6 6var: 6 6 77 5 4 6 6 77 6 6 77 nominal:book BoundPaper 6 6 77 6 2 3 6 77 6 6 77 6 class:proper name 6 77 6 6 7 6 77 6 6topic:not tpc 7 6 77 6 6 7 6 77 6 6 7 6 77 6 7 6id obj: 6 77 root:Mary 6 6 7 6 77 6 6 7 6 77 6 6mass: 7 6 77 6 4 5 4 57 4 5 focus:not tpc mood: 6 6voice: 6

Similarly, rules for handling relative clauses also use the structure of the QLF: np_rel_cl :: A^[and,NOM,[island,RELCLAUSE]] ==> [postmodifiers:[relcl:RELCLAUSEo] | NOMo] where [A^NOM ==> NOMo, RELCLAUSE ==> RELCLAUSEo]. relli :: [VERB,_Idx,ARG1,ARG2,ARG3] ==> [subj:ARG1o,ind_obj:ARG2o,obj:ARG3o]

30

where [is_a_qlf_constant(VERB), ARG1 ==> ARG1o, ARG2 ==> ARG2o, ARG3 ==> ARG3o].

This is also true for noun-modifying PPs and adjectives: np_pp :: A^[and,NOM,PP] ==> [postmodifiers:PPo | NOMo] where [PP = form(_,prep(_),_,_,_), A^NOM ==> NOMo, PP ==> PPo]. np_adj :: A^[and,[Pred,A],[Adj1,A]] ==> [var:A,premodifiers:[Adj1],nominal:Pred].

The rules for handling the verb `be' are also di erent. They use the following QLF structure: be_predicative :: [be,_Idx,ARG1,X^[PROP,X]] ==> [be:predicative_be, subj:ARG1o, arg2:PROPo] where [ARG1 ==> ARG1o, PROP ==> PROPo].

be_equative :: [be,_Idx,ARG1,X^[eq,X,PROP2]]

31

==> [be:equative_be, subj:ARG1o, arg2:PROP2o] where [ARG1 ==> ARG1o, PROP2 ==> PROP2o].

be_proplubisey :: [be,_Idx,ARG1,X^PROP2] ==> [be:proplubisey, subj:ARG1o, arg2:PROP2o] where [ARG1 ==> ARG1o, PROP2 ==> PROP2o].

The conditional (if) also uses the QLF structure. The part of the sentrence with `if' produces a QLF starting with `[imp,  ': conditional :: [impl, ARG1, ARG2] ==> [cause:ARG1o, effect:ARG2o]

where [ARG1 ==> ARG1o, ARG2 ==> ARG2o].

Gerunds, ordered nouns (i.e. rst, second), occation (i.e. times) and degree information (i.e. most) are also conveyed through the structure of the QLF. 32

Here is an example where information is gathered by using both the term construct and the structure of the QLF within that term construct. In QLF notation adjectival NP modi ers are translated as conjunctions: the big black book: term(l([the,big,black,book]),ref(def,the,sing,l([])),_, B^ [and,[and,[book_BoundPaper,B],[black_Coloured,B]], [big_Large,B]], _,np:[focus=subj,mass=n])]

The transfer rule for adjectival NP modi ers is as follows: np_adj_adj ::

A^[and,[and,[Pred,A],[Adj1,A]],[Adj2,A]]

=) [var:A,premodifiers:[Adj1,Adj2],nominal:Pred].

The above rule (and example) is for the case with two adjectives. The rule for only one adjective can be found in Appendix B. The reason for using two rules is due to the diculty of making a fully general implementation. The CF produced by this rule applied to the above example is: 2

2

6 6 6 6 6 6 4

6 6 6 6premodi ers: 6 6 4

class:common 6 6 6 6 6 6number:singular 6 6 6 6determiner:the 6 6 6 6 6 6mass:n 6 6 6 6 6 focus:subj 6arguments: 6 6 6 6 6 6var: 6 nominal:book BoundPaper

33

33 77 77 77 77 77 77 77 77 77 77 77 77 77 77 2 377 7 black Coloured577777 4 77 big Large 77 57 5

CHAPTER 4 CONCLUSION The rst objective of this study was to design the QLF-to-CF module of this project. This has been largely accomplished and the product can be found in Appendix B. Further enhancements of the module will be necessary as the next module, the English Case Frame to Turkish Case Frame module, nishes and the testing of the MT system starts. The most important problem that was faced in this study was the lack of documentation of the CLE system and the QLFs. This made it necessary to make great numbers of tests on CLE with di erent sets of sentences covering almost all grammatical aspects of the English language. For every grammatical feature, a test had to be run on CLE and the resulting QLF had to be examined in detail in order to create its case frame counterpart which had to include explicit grammatical representation of the parsed sentence. The variety of di erent grammatical features and a great number of exceptional cases that can be in a natural language made this task a very labor intensive a air. This lack of documentation also made it necessary to visit SRI International at Cambridge, England, and meet with scientists who had participated in the creation of the CLE and had the rst hand knowledge on CLE and the QLFs. Indeed, another objective of this study was to document main guidelines of the CLE system the Quasi Logical Forms so that a future user can easily comprehend the system and make necessary revisions on the QLF-to-CF module. It is obvious that as the project evolves and nears to the stage of completement, modi cations 34

in the system will be unavoidable. These modi cations will be achieved through the documentation provided by this study. All of the previous chapters and Appendix A, C and D serve for this purpose. I believe that this study will ll in the gap created by the inadequate documentation of the CLE. Appendix B is the QLF-to-CF module which is the main product of this study. As explained before, this module takes the grammatical information from the QLF by using either the term construct, the form construct, or the structure of the QLF. Chapters 2 and 3 have explained this in detail. Further information possible is listed in References. This module was speci cally designed to cover a large part of IBM computer manuals as its corpus. Provided that necessary lexical entries were done, it can also achieve high rates of coverage in other domains too. Another point that has to be made is about the problem of pronoun resolution. The rules for mapping pronouns are written assuming that pronoun resolution is not going to be made at this phase of the project. Even though CLE can make pronoun resolution, the QLFs produced in this case have a recursive structure and the pronoun mapping rules written here are not coded to take this into account. Therefore any pronoun resolution has to be made as an intermediate step after the QLF-to-CF mapping is performed. Future improvements on the system can be brought by developing a more elaborate sorting method for the multiple parses produced by ambiguious sentences which will take the characteristics of the domain more into account. The great numbers of di erent parses produced by the CLE in some sentences (for instance sentences with relative clauses) was another problem faced in this study. This seems to be the imminent step that must be achieved to further improve the system. If another domain is to be translated in the future, the QLF-to-CF module as well as the sorting algorithms may have to be changed.

35

REFERENCES [1] Hiyan Alshawi, David Carter, Richard Crouch and Steve Pulman, Manny Rayner, and Arnold Smith. \Clare: A Contextual Reasoning and Cooperative Response Framework for the Core Language Engine, Final Report". Technical Report IEATP IED4/1/1165, SRI 8468, SRI International, Cambridge, UK, December 1992. [2] Hiyan Alshawi, David Carter, Manny Rayner, and Bjorn Gamback. \Translation by Quasi Logical Form Transfer". ACL Conference, 1991. [3] Hiyan Alshawi and Robert C. Moore. \Introduction to CLE". In The Core Language Engine, chapter 1. MIT Press, Cambridge, Massachusets, 1992. [4] Win eld Bennet and Jonathan Slocum. \The LRC Machine Translation System". Computational Linguistics, 11(2-3), 1985. [5] Cigdem Keyder and Mehmet R. Tolun. \I_ngilizce'den Turkce'ye BilgiTabanl otomatik Ceviri Sistemi". Bilisim Dergisi, (43-44), Ekim 1993. [6] NATO Science Division. \\Turkish Natural Language Processing Initiative", Project Plan". Technical report, Middle East Technical University and Bilkent University and Halc Computing Inc, 1996. [7] Bonie Jean Dorr. Machine Translation. MIT Press, 1993. [8] Kenneth Goodman and Sergei Nirenburg. The KBMT Project: A Case Study in Knowledge-Based Machine Translation. Morgan Kaufmann Publishers, San Mateo, California, 1991. [9] Pierre Isabelle and Laurent Bourbeau. \TAUM-AVIATION: Its Technical Features and Some Experimental Results". Computational Linguistics, 11(1), 1985. [10] Rod Johnson, Maghi King, and Lois des Tombe. \EUROTRA: A Multilingual System Under Development". Computational Linguistics, 11(2-3), 1985. [11] Minsky. \A Framework For Representing Knowledge". In P. Winston, editor, The Psychology of Computer Vision. McGraw Hill, New York, 1975. [12] Sergei Nirenburg, Jaime Carbonell, MAsaru Tomita, and Kenneth Goodman. Machine Translation, A Knowledge-Based Approach. Morgan Kaufmnann Publishers, 1992. 36

[13] J.A. Robinson. \Logic and Logic Programming". Communications of the ACM, 35, 1992. [14] Zeki Sagay. \A Computer Translation of English to Turkish". Master's thesis, Department of Computer Engineering, Middle East Technical University, June 1981. [15] Jan van Eijick and Hiyan Alshawi. \Logical Forms". In The Core Language Engine, chapter 2. MIT Press, Cambridge, Massachusets, 1992.

37

APPENDIX A MODIFYING THE CLE A.1 Running the System To run the CLE system with Case Frame features one must run the executable cle/camcf/local/Run man es3. This will bring a Prolog prompt to the screen. The CLE and the QLF-to-CF mapping module runs on Sicstus Prolog V.3. To load the lexicon le and the mapping rules from English to English case frame the following Prolog command must be called: initialise cf trans.

These les can therefore be altered without the system needing to be remade. initialise cf trans also compiles rules which annotate QLFs with syntactic information, and sets appropriate switches. There are two ways to run the QLF-to-CF system.In the rst one the user can either go into the CLE by typing s.

and then enter the sentence ending with a dot. Then type y if the semantic analysis is well-formed. (You can check the semantic analysis by checking the place of curly braces). This will bring three forms: The QLF of the sentence, the CF of the sentence, and the grammatical structure of the sentence. To get more debugging information, go back into Prolog by typing .q

Then type rwdbug.

38

Go back in to the CLE by typing s.

and continue. Debugging can be switched o by typing rwdbug. a second time. The second option is to use the mapping without going into the main CLE loop by calling the Prolog predicate: atom to case frame(+Atom,-CaseFrame). e.g.

atom to case frame('Show me',C).

The system still has to be initialized as before. as before.

Rwdbug

also works similarly

A.2 Modifying the QLF-to-CF Module The QLF to CF module le has the name manqlfrwrules.pl and it is in the man directory which contains all the les concerning the IBM manuals and their translation. Within this directory there is a le with name mantoploop.pl which can also be found in Appendix D. This le contains some Prolog predicates used for loading QLF-to-CF mapping rules (from the le manqlfrwrules.pl) and for loading extra lexicon speci c to IBM manuals. In order to load the maping rules and lexicon, one must call the prolog command initialise cf trans.

This loads the les which are under development, i.e. the lexicon and the mapping rules from English-to-English case frame. These les can therefore be altered without the system needing to be remade. If it is important to keep the original les when modifying the system, then a copy of mantoploop.pl can be made with a di erent name (i.e. mytoploop.pl) and this le may be modi ed instead and then, consulted to CLE when the prolog prompt ':-' is on. The part which must be modi ed is the initialise cf trans command part. initialise_cf_trans :compile_and_load_qlf_annotation_rules, set_switch(store_qlf_trees_for_printing),

39

consult('/home/users/cle/camcf/man/manqlfrwrules.pl'), s(".lf /home/users/cle/camcf/man/manextraleximp.pl"), s(".ici").

Notice that in line 3 the le containing the mapping rules are loaded to the system. So simply modifying this line may help. The command `initialise cf trans' must also be changed to a di erent name (i.e. init my cf trans). Then, after running the CLE system in the prolog prompt writing init my cf trans will load the new mapping rules to the CLE system. The forth and fth lines in initialise cf trans will load the lexicon le and internalize it. If it is necessary to continuously update the mapping rules and reload them (which is natural when implementing the rules for the rst time) another prolog command such as initmytrans :consult('$HOME/cle_related_directory/mytoploop.pl').

will save the user time because this time it will not load the lexicon le and internalize it which are tasks taking considerable time of the computer (and the user). But the user must not forget to load 'initialise cf trans' or 'init my cf trans' when rst loading because these commands load the lexicon le and internalise it. This must be done at least once.

A.3 Making New Lexical Entries to the CLE System New lexical entries to the CLE system can be made by tool called `Lexmake' which is a Tcl/Tk-based graphical interface for the addition and modi cation of lexical entries. The lexical tool uses the command `Run lexmake' with arguments `eng' for English and `man' for manual domain. For example the user can run the command: `~cle/camcf/cam/bin/Run_lexmake eng man ~cle/camcf/man/manextraleximp.pl tem1lexicon.pl'

which should take the le manextraleximp.pl and allow the user to edit it, sending the results to the le tem1lexicon.pl 40

APPENDIX B QLF TO CF MAPPING RULES /*--------------------------------------------------------------------Rewriting rules for turning quasi logical forms (QLFs) to caseframes (CFs). Bahadir Pehlivanturk modifies and extends the code written by Steve Pulman, SRI International, July 1996 In these rules, I have used the convention that variables instantiated by Prolog unification begin with an uppercase letter, whereas variables that will be dealt with by a recursive rewrite, or which figure in some condition, are all upper case. Note that prolog variables will therefore be identical in both the QLF and the CF, whereas the other variables will appear in the QLF in their input form XXXX and in the CF in their output form XXXXo (`o' for `out') by convention.

The first four rules deal with the top level `speech act' functors: ----------------------------------------------------------------------*/

dcl :: [dcl,REST] ==> [mood:declarative | RESTo] where [REST ==> RESTo]. imp :: [imp,REST] ==> [mood:imperative | RESTo] where [REST ==> RESTo].

ynq :: [ynq,REST] ==> [mood:interrogative | RESTo] where [REST ==> RESTo].

41

whq_int :: [whq,REST] ==> [mood:interrogative | RESTo] where [REST ==> RESTo]. /*--------------------------------------------------------------------Notice that since we need all the features to be at the same level of nesting in the CF we splice in the recursion using `|'. ----------------------------------------------------------------------*/ /*--------------------------------------------------------------------The next two rules deal with `form' constructs for verbs, translating the various auxiliary features, and then recursing on the main predication. Since we need to know the verb itself before the recursion we use a pattern condition on the PROP variable to get at it. We also use a simple conditional to avoid multiple cases of the same rule (the transfer rule formalism used macros for this). ----------------------------------------------------------------------*/

verb_features :: form(_WList, verb(TENSE,Perf,Prog,MODALITY,PASSIVE), _Idx, B^[B,PROP],_Rfnt) ==> [voice:PASSIVEo, verb:[root:Verb, tense:TENSEo, perfect:Perf, progressive:Prog, negation:positive, modality:MODALITYo, fin/inf:FINo], arguments: PROPo] where [TENSE ==> TENSEo, MODALITY ==> MODALITYo, PROP ==> PROPo, if(PASSIVE = y, PASSIVEo = active, PASSIVEo = passive), if(TENSE = to_inf, FINo = infinitival, if(TENSE = no, FINo = untensed, FINo = finite)), PROP = [Verb | _Arguments]].

verb_features_with_not :: [not,form(_WList, verb(TENSE,Perf,Prog,MODALITY,PASSIVE), _Idx, B^[B,PROP],_Rfnt)] ==> [voice: PASSIVEo, verb:[root:Verb, tense:TENSEo, perfect:Perf, progressive:Prog, negation:negative, modality:MODALITYo, fin/inf:FINo], arguments:PROPo] where

42

[TENSE ==> TENSEo, MODALITY ==> MODALITYo, PROP ==> PROPo, if(PASSIVE = y, PASSIVEo = active, PASSIVEo = passive), if(TENSE = to_inf, FINo = infinitival, if(TENSE = no, FINo = untensed, FINo = finite)), PROP= [Verb | _Arguments]].

/*--------------------------------------------------------------------Now some example transfer rules for terms: ----------------------------------------------------------------------*/

dummy_imp_subj :: term(WORDSTRING, ref(pro,you,_,l([])), _Idx, D^[personal,D], _Qnt, atom_that_will_not_unify_with_a_category)

==> [class:pronoun, nominal:dummy_imp_subj] where

[var(WORDSTRING)].

/*--------------------------------------------------------------------In imperatives, there is a term for `you' that was not explicit in the input, so probably should not be translated. This rule enables it to be distinguished from real `you'. ----------------------------------------------------------------------*/

dummy_passive_subj :: term(WORDSTRING, ref(pass_agent), _Idx, D^[entity,D], _Qnt, atom_that_will_not_unify_with_a_category) ==> [class:pronoun, nominal:dummy_passive_subj] where

[var(WORDSTRING)].

/*--------------------------------------------------------------------This is an analogous rule for passive subjects. Note that both of these rules test for the WORDSTRING being uninstantiated, usually a good sign that this construct does not correspond to any words in the input. ----------------------------------------------------------------------*/ /*--------------------------------------------------------------------The below rule is for dummy comperative noun. It used when a dummy noun is needed in comperative sentences such as 'John is better'. (i.e. John is better than dummy_comp_nps. ----------------------------------------------------------------------*/ dummy_comp_nps :: term(_,comp_pro(_),_,Idx^[entity,Idx],_,_) ==> class:dummy_comp_nps.

43

/*-------------------------------------------------------------------Here are the rules for various nouns that can occure in various NPs ---------------------------------------------------------------------*/ pronouns :: term(l([Lexeme]), % i.e.a real np ref(pro,Lexemeo,NUMBER,_Ant), _Idx, _Restriction, _Qnt, np:[focus=Focus,mass=Mass]) ==> [class:pronoun, number:NUMBERo, root:Lexemeo, mass:Mass, focus:Focus] where

[NUMBER ==> NUMBERo].

/*-------------------------------------------------------------------*/ proper_names :: term(l([Lexeme]), % i.e.a real np proper_name(TPC), _Idx, _Restriction, _Qnt, np:[focus=Focus,mass=Mass]) ==> [class:proper_name, topic:TPCo, root:Lexeme, mass:Mass, focus:Focus] where

[TPC ==> TPCo].

/*-------------------------------------------------------------------*/ reflexive_pronouns :: term(l([Lexeme]), ref(refl,Lexemeo,NUMBER,_Ant), _Idx, _Restriction, _Qnt, np:[focus=Focus,mass=Mass]) ==> [class:reflexive_pronoun, number:NUMBERo, root:Lexeme, mass:Mass, focus:Focus] where

[NUMBER ==> NUMBERo].

/*-------------------------------------------------------------------*/ demonstrative_nps :: term(l([_Some|_Words]), ref(dem,Lexeme,NUMBER,_Antecedents), Idx, Idx^RESTRICTION, % unify Idx and lambda var _Qnt, np:[focus=Focus,mass=Mass]) ==>

44

[class:demonstrative_noun, number:NUMBERo, determiner:Lexeme, mass:Mass, focus:Focus | RESTRICTIONo] where [NUMBER ==> NUMBERo, Idx^RESTRICTION ==> RESTRICTIONo].

/*-------------------------------------------------------------------*/ definite_nps :: term(l([_Some|_Words]), ref(def,Lexeme,NUMBER,_Antecedents), Idx, Idx^RESTRICTION, % unify Idx and lambda var _Qnt, np:[focus=Focus,mass=Mass]) ==> [class:definite_noun, number:NUMBERo, determiner:Lexeme, mass:Mass, focus:Focus | RESTRICTIONo] where [NUMBER ==> NUMBERo, Idx^RESTRICTION ==> RESTRICTIONo]. /*-------------------------------------------------------------------*/ quantified_np :: term(l([_Some|_Words]), q(_Tpc,Lexeme,NUMBER), Idx, Idx^RESTRICTION, _Qnt, np:[focus=Focus,mass=Mass]) ==> [class:quantified_noun, number:NUMBERo, determiner:Lexeme, mass:Mass, focus:Focus | RESTRICTIONo] where [NUMBER ==> NUMBERo, Idx^RESTRICTION ==> RESTRICTIONo]. /*-------------------------------------------------------------------*/ time_nps :: term(l([Some|Words]), time(TIME), _Idx, REST, _Qnt, np:[focus=Focus,mass=Mass])

45

==> [class: time_noun, time:TIMEo, rest:RESTo, mass:Mass, focus:Focus] where

[TIME ==> TIMEo, REST ==> RESTo].

/*------------------------------------------------------------------For the above rule function properly the below three rules are also needed for tranfering the 'hour', 'minute' and 'day' information. --------------------------------------------------------------------*/ hour_min :: Idx^[and,[hour_num,Idx,REST1],[and,[minute_num,Idx,REST2],[day_part,Idx,hours]]] ==>

[hour:REST1o,min:REST2o] where [REST1 ==> REST1o,REST2 ==> REST2o]. hour_min :: Idx^[and,[hour_num,Idx,REST1],[minute_num,Idx,REST2]] ==> [hour:REST1o,min:REST2o] where [REST1 ==> REST1o,REST2 ==> REST2o].

day_name :: Idx^[day_name,Idx,Day] ==> Day. /*-------------------------------------------------------------------*/ ordered_noun :: term(l([_Some|_Words]), ord(ref(def,Lexeme,NUMBER,_Antecedents), Idx^[Order,Idx],order,Num,NUMBER), _Idx, RESTRICTION, _Qnt, np:[focus=Focus,mass=Mass]) ==> [class:ordered_noun, number:NUMBERo, determiner:Lexeme, mass:Mass, focus:Focus, order:Order, num:Num |RESTRICTIONo] where [NUMBER ==> NUMBERo, RESTRICTION ==> RESTRICTIONo]. /*-------------------------------------------------------------------*/

order2 :: ord(ref(def,Lexeme,NUMBER,_Antecedents), Idx^[Order,Idx],order,Num,NUMBER) ==>

46

[order:Order, number:NUMBERo, determiner:Lexeme, num:Num] where [NUMBER ==> NUMBERo].

/*-------------------------------------------------------------------*/ order :: ord(ref(def,Lexeme,NUMBER,_Antecedents), Idx1^Idx2^[degree,Idx3^[Order,Idx3],Idx1,Idx2],ordered,Num1,Num2) ==> [order:Order, num1:Num1o, num2:Num2o] where [Num1 ==> Num1o, Num2 ==> Num2o]. /*-------------------------------------------------------------------*/ /*--------------------------------------------------------------------In these two rules we have to identify the Idx with the lambda-variable of the restriction. If we do not, then the `var' representing the gap in a relative clause will not be linked to the head noun. This kind of thing seems to be done as a side effect of reference resolution in the CLE for some reason. ----------------------------------------------------------------------*/

/*--------------------------------------------------------------------QLFs are unsystematic in use of `sing' or `singular', etc: ----------------------------------------------------------------------*/ sing_singular :: plur_plural ::

sing ==> singular. plur ==> plural.

/*--------------------------------------------------------------------The restriction component of a term can be of various types. The following rules deal with the most common cases: relative clauses, pp postmodifiers, and adjective pre-modifiers: ----------------------------------------------------------------------*/

np_rel_cl :: A^[and,NOM,[island,RELCLAUSE]] ==> [postmodifiers:[relcl:RELCLAUSEo] | NOMo] where [A^NOM ==> NOMo, RELCLAUSE ==> RELCLAUSEo].

np_pp :: A^[and,NOM,PP] ==> [postmodifiers:PPo | NOMo] where [PP = form(_,prep(_),_,_,_), A^NOM ==> NOMo, PP ==> PPo].

47

/*--------------------------------------------------------------------The following rule also handles some of the relative clauses. ----------------------------------------------------------------------*/ rel_term :: [VERB,_Idx,ARG1,ARG2,ARG3] ==> [subj:ARG1o,ind_obj:ARG2o,obj:ARG3o] where [is_a_qlf_constant(VERB), ARG1 ==> ARG1o, ARG2 ==> ARG2o, ARG3 ==> ARG3o].

/*--------------------------------------------------------------------Assume that any other conjunction must be an Adj premodifier. Notice that the actual nominal is always the most deeply embedded leftmost conjunct. It is difficult to do this fully generally so we list the cases of one and two premodifying adjectives. ----------------------------------------------------------------------*/ np_adj_adj :: A^[and,[and,[Pred,A],[Adj1,A]],[Adj2,A]] ==> [var:A,premodifiers:[Adj1,Adj2],nominal:Pred].

np_adj :: A^[and,[Pred,A],[Adj1,A]] ==> [var:A,premodifiers:[Adj1],nominal:Pred].

np_restriction_no_modifier :: A^[Pred,A] ==> [var:A,nominal:Pred]. /*--------------------------------------------------------------------The following is a rule for adverbials (i.e. very, etc.) ----------------------------------------------------------------------*/ np_adverb_adj :: [Lex,Idx^[Root,Idx],Var] ==> [adverb:Lex, premodifier:Root].

np_adverb_adj :: A^[and,[Pred,A],[Adv,Idx^[Adj1,Idx],A]] ==> [var:A,adverb:Adv,premodifiers:[Adj1],nominal:Pred]. /*--------------------------------------------------------------------the following is the rule for adverbs such as 'too'. ----------------------------------------------------------------------*/ adverb ::

48

form(_WList, adv(Adv), _, Idx^[Idx,PROP], _Rfnt) ==> [adverbial:Adv | PROPo] where [PROP ==> PROPo].

/*--------------------------------------------------------------------The restriction is a simple top level one. ----------------------------------------------------------------------*/

variables :: v(Var) ==> Var. /*--------------------------------------------------------------------Some variable args inside relatives or other construct have a `v' wrapped round them (to stop the generator floundering). This gets rid if the semantically unimportant `v'. ----------------------------------------------------------------------*/

/*--------------------------------------------------------------------A sample rewriting rules for verbs which can add thematic role info. Notice we ignore the predicate itself, which has already been transferred at the form level, via use of a pattern. We also ignore the event index: type_Print :: [type_Print,_Idx,ARG1,ARG2] ==> [agent:ARG1o, patient:ARG2o] where [ARG1 ==> ARG1o, ARG2 ==> ARG2o].

However, until we are sure that we need thematic role info, it is simpler to write some general cases: ----------------------------------------------------------------------*/

intransitives :: [VERB,_Idx,ARG1] ==> [subj:ARG1o] where [is_a_qlf_constant(VERB), ARG1 ==> ARG1o].

49

transitives :: [VERB,_Idx,ARG1,ARG2] ==> [subj:ARG1o,obj:ARG2o] where [is_a_qlf_constant(VERB), ARG1 ==> ARG1o, ARG2 ==> ARG2o].

ditransitives :: [VERB,_Idx,ARG1,ARG2,ARG3] ==> [subj:ARG1o,obj:ARG2o,ind_obj:ARG3o] where [is_a_qlf_constant(VERB), ARG1 ==> ARG1o, ARG2 ==> ARG2o, ARG3 = term(_,_,_,_,_,_), % This line is added to distinguish this ARG3 ==> ARG3o]. % rule from the rule for relative clause

/*--------------------------------------------------------------------`form' constructs for PPs: ----------------------------------------------------------------------*/

pp :: form(_WList, prep(Prep), _, B^[B,Var,ARG2], _) ==> [pp:[pform:Prep, arguments:[arg1=Var,ARG2o]]] where [ARG2 ==> ARG2o]. is_a_qlf_constant(Word) :atom(Word), contains_underscore(Word).

contains_underscore(Word) :name(Word,List), member(95,List). /*-------------------------------------------------------------------*/ /*------------------------------------------------------------------The verb 'be' has special properties so the CLE handles the verb 'be' in a different manner which also had to be reflected in the QLF-to-CF module via writing different rules for 'be'. The following three rules are for three different types of 'be' that can occure in English sentences. -------------------------------------------------------------------*/

50

be_predicative :: [be,_Idx,ARG1,X^[PROP,X]] ==> [be:predicative_be, subj:ARG1o, arg2:PROPo] where [ARG1 ==> ARG1o, PROP ==> PROPo].

be_equative :: [be,_Idx,ARG1,X^[eq,X,PROP2]] ==> [be:equative_be, subj:ARG1o, arg2:PROP2o] where [ARG1 ==> ARG1o, PROP2 ==> PROP2o].

be_pp :: [be,_Idx,ARG1,X^PROP2] ==> [be:pp_be, subj:ARG1o, arg2:PROP2o] where [ARG1 ==> ARG1o, PROP2 ==> PROP2o].

/*--------------------------------------------------------------------Noun and verb conjunctions are handled as follows. (i.e. and, or, etc.) ---------------------------------------------------------------------*/ noun_conj :: term(l([_Some|_Words]), conjdet(Topic,Conjtype,Phrase,Idx,_), Idx, D^REST, _Qnt, np:[focus=Focus,mass=Mass]) ==> [class:noun_conjunction, conjtype:Conjtype, phrase:Phrase, topic:Topic, mass:Mass, focus:Focus, arguments:RESTo] where

51

[D^REST ==> RESTo]. noun_conj_arguments ::

==>

form(_, conj(Phrase,Conjtype), _, X^[X,Var,PROP1,PROP2], _Rfnt)

[phrase:Phrase, conjtype:Conjtype, arg1:PROP1o, arg2:PROP2o] where [PROP1 ==> PROP1o, PROP2 ==> PROP2o]. /*-------------------------------------------------------------------*/ verb_conj :: form(_WList, conj(Phrase,Conjtype), _Idx, B^[B,PROP1,PROP2], _Rfnt) ==> [phrase:Phrase, conjtype:Conjtype, arg1:PROP1o, arg2:PROP2o] where [PROP1 ==> PROP1o, PROP2 ==> PROP2o]. /*-------------------------------------------------------------------*/ /*------------------------------------------------------------------Here is the rule for transfering compound nouns. This rule can work recursively. ---------------------------------------------------------------------*/ compound_noun :: Idx1^form(_WList, nn, _, Idx2^[and,[ARG2,Idx1],[Idx2,Idx1,ARG1]], _Rfnt) ==> [compound_noun:arg1:ARG1o, arg2:ARG2o] where [ARG1 ==> ARG1o, ARG2 ==> ARG2o]. /* The first argument of a compound noun is a form of compound noun */ compound_arg1 :: term(_, q(_Tpc,Lexeme,NUMBER), Idx1, Idx^RESTRICTION, Qnt, qnt(Idx1))

52

==> [class:noun, number:NUMBERo, determiner:Lexeme, qnt:Qnt | RESTRICTIONo] where [NUMBER ==> NUMBERo, Idx^RESTRICTION ==> RESTRICTIONo]. /*-------------------------------------------------------------------*/ /*-------------------------------------------------------------------It is rather awkward to handle ellipsis though the following two rules makes a quite successful attempt. These two are for different phrases. If the ellipsis does not fit into any phrase type than CLE or the QLF-to-CF module generally fails. However in real type there is never a need to use such ellipsis. ---------------------------------------------------------------------*/

ellipsis :: form(_WList, ell(Phrase), _Idx, A^[A,PROP],_Rfnt) ==> [type : ellipsis, phrase_type:Phrase | PROP2o] where [PROP = Var^PROP2, PROP2 ==> PROP2o].

ellipsis :: form(_WList, ell(Phrase), _Idx, B^[B,PROP],_Rfnt) ==> [type : ellipsis, phrase_type:Phrase, arguments:PROPo] where [PROP ==> PROPo]. /*-------------------------------------------------------------------*/ /*-------------------------------------------------------------------The following is the rule for the conditional (i.e. if). Notice that the QLF starts with [impl,...] which is used in this rule. ---------------------------------------------------------------------*/ conditional :: [impl, ARG1, ARG2] ==> [cause:ARG1o, effect:ARG2o]

53

where [ARG1 ==> ARG1o, ARG2 ==> ARG2o]. /*-------------------------------------------------------------------The following rule became necessari for the sentences like 'John made Mary sleep'. ---------------------------------------------------------------------*/ apply :: [apply,Idx^ARG1,ARG2] ==> [apply:arg1:ARG1o,arg2:ARG2o,var:Idx] where [ARG1 ==> ARG1o, ARG2 ==> ARG2o]. /*-------------------------------------------------------------------Here is the rule for comperatives ---------------------------------------------------------------------*/ comperative :: [more,_^Idx^[Degree,Idx,Level],Idx1^[degree,Idx2^[Adj,Idx2],Idx3,Idx1], Idx4^ [degree,Idx2^[Adj,Idx2], REST,Idx4]] ==> [comperative_type:Degree, level:Level, adj:Adj, argument:RESTo] where [REST ==> RESTo]. /*-------------------------------------------------------------------Repetition (i.e. times, etc) ---------------------------------------------------------------------*/

occasion :: Idx^[and,[time_Occasion,Idx],REST] ==> [occasion: RESTo] where [REST ==> RESTo].

/*-------------------------------------------------------------------Possessor (i.e. my, you, John's, etc.) ---------------------------------------------------------------------*/ poss_nps :: A^[and,[PRED,A],REST] ==> [var:A,nominal:PRED, possesed:RESTo]

54

where [REST = form(_,poss,_,_,_), REST ==> RESTo].

possessor :: form(_, poss, _, Idx1^[and,[PRED,Idx2], [Idx1,Idx2,REST]], _) ==> [nominal:PRED, possessor:RESTo] where [REST ==> RESTo]. /*-------------------------------------------------------------------Genitives (as in 'The wife of Bill'). ---------------------------------------------------------------------*/ genit_nps :: A^[and,[NOM,A],GENIT] ==> [genitive:GENITo, nominal:NOMo] where [GENIT = form(_,genit(_),_,_,_), NOM ==> NOMo, GENIT ==> GENITo]. genit :: form(_WList, genit(Gen), _, Idx1^[and,[NOM,Idx2],[Idx1,Idx2,ARG1]], _) ==> [genit:[gform:Gen, nominal:NOMo, arguments:ARG1o]] where [NOM ==> NOMo, ARG1 ==> ARG1o]. /*------------------------------------------------------------------*/ path :: form(_WList, verb(TENSE,Perf,Prog,MODALITY,PASSIVE), Idx2, Idx1^[Idx1,REST,[Verb,Idx2,ARG1]], _Rfnt) ==> [voice:PASSIVEo, verb:[root:Verb, tense:TENSEo, perfect:Perf, progressive:Prog, negation:positive,

55

modality:MODALITYo, fin/inf:FINo], arguments:path: RESTo, subj: ARG1o] where [TENSE ==> TENSEo, MODALITY ==> MODALITYo, if(PASSIVE = y, PASSIVEo = active, PASSIVEo = passive), if(TENSE = to_inf, FINo = infinitival, if(TENSE = no, FINo = untensed, FINo = finite)), REST ==> RESTo, ARG1 ==> ARG1o]. /*------------------------------------------------------------------*/ gerund :: Idx^[and,[eq,Idx,v(Var)],PROP] ==> [gerund:PROPo] where [PROP ==> PROPo]. /*-----------------------------------------------------------------Here is the rule for handling adjuncts. --------------------------------------------------------------------*/ adjunct :: form(_WList, verb(TENSE,Perf,Prog,MODALITY,PASSIVE), Idx, B^[B,RESTRICTION,[Verb,Idx,ARG1,ARG2]],_Rfnt) ==> [voice: PASSIVEo, verb:[root:Verb, tense:TENSEo, perfect:Perf, progressive:Prog, negation:positive, modality:MODALITYo, fin/inf:FINo], adjuncts:RESTRICTIONo, arguments:[arg1: ARG1o, arg2: ARG2o]] where [TENSE ==> TENSEo, MODALITY ==> MODALITYo, if(PASSIVE = y, PASSIVEo = active, PASSIVEo = passive), if(TENSE = to_inf, FINo = infinitival, if(TENSE = no, FINo = untensed, FINo = finite)), RESTRICTION ==> RESTRICTIONo, ARG1 ==> ARG1o, ARG2 ==> ARG2o]. /*------------------------------------------------------------------*/ manner :: form(_WList, verb(TENSE,Perf,Prog,MODALITY,PASSIVE), _Idx, Idx1^[Idx1,[Manner,v(Idx2)],PROP], _Rfnt) ==>

56

[voice:PASSIVEo, verb:[root:Verb, tense:TENSEo, perfect:Perf, progressive:Prog, negation:positive, modality:MODALITYo, fin/inf:FINo], manner:Manner, arguments: PROPo] where [TENSE ==> TENSEo, MODALITY ==> MODALITYo, PROP ==> PROPo, if(PASSIVE = y, PASSIVEo = active, PASSIVEo = passive), if(TENSE = to_inf, FINo = infinitival, if(TENSE = no, FINo = untensed, FINo = finite)), PROP = [Verb | _Arguments]].

manner_not :: [not, form(_WList, verb(TENSE,Perf,Prog,MODALITY,PASSIVE), _Idx, Idx1^[Idx1,[Manner,v(Idx2)],PROP], _Rfnt)] ==> [voice:PASSIVEo, verb:[root:Verb, tense:TENSEo, perfect:Perf, progressive:Prog, negation:negative, modality:MODALITYo, fin/inf:FINo], manner:Manner, arguments: PROPo] where [TENSE ==> TENSEo, MODALITY ==> MODALITYo, PROP ==> PROPo, if(PASSIVE = y, PASSIVEo = active, PASSIVEo = passive), if(TENSE = to_inf, FINo = infinitival, if(TENSE = no, FINo = untensed, FINo = finite)), PROP = [Verb | _Arguments]]. /*------------------------------------------------------------------*/ /* This part has a posible conflict with intransitives. is_s_qlf_contant part has be modified */ in_order_to_ForPurpose :: [in_order_to_ForPurpose,v(Idx),REST] ==> [in_order_to_ForPurpose:RESTo] where [REST ==> RESTo]. /*------------------------------------------------------------------*/ ~

57

APPENDIX C REWRITE RULE INTERPRETER /*--------------------------------------------------------------------Rewrite.pl Steve Pulman, SRI International, July 1996

This file defines a simple rewrite rule interpreter, along with an equally simple tracing mechanism. The format of the rewriting rules are defined by these operator definitions: ----------------------------------------------------------------------*/ :- op(1000,xfx, '::'). :- op(800,xfx,'==>'). :- op(900,xfx,'where'). /*--------------------------------------------------------------------Rewrite rules have the form: :: ==> where [*]. or the form: :: ==> . for the case where there are no conditions. The semantics is that if a QLF unifies with the and each of the succeeds, then the results. The conditions can be recursive rewritings on components of the QLF, using the notation QLF ==> CASEFRAME, or any Prolog defined predicate. The rules so far only use a few extra unifications to destructure terms, and some `var' or `nonvar' tests. Examples of the rewrite rules can be found in manqlfrwrules.pl ----------------------------------------------------------------------*/

58

/*--------------------------------------------------------------------qlf_rewrite(+QLF,-CaseFrame,-Trace) -----------------------------------Args QLF : QLF is any QLF expression CaseFrame: a structure formed from lists of attribute:value pairs Trace: tree of rewrite rule ids isomorphic to derivation Succeeds at least once. ----------------------------------------------------------------------*/

qlf_rewrite(QLF,CaseFrame,[Id,Trace]) :nonvar(QLF), (Id :: QLF ==> CaseFrame where Conditions), rwdbug(Id:'matches, checking conditions:'), qlf_rewrite_call(Conditions,Trace). /*--------------------------------------------------------------------Find a rule that matches the QLF, and try the conditions. Checking for non-var status of QLF avoids circularities. Vars will be caught by the last clause. rwdbug is a simple tracing mechanism ----------------------------------------------------------------------*/

qlf_rewrite(QLF,CaseFrame,Id) :nonvar(QLF), (Id :: QLF ==> CaseFrame), rwdbug(Id:'matches, no conditions'). /*--------------------------------------------------------------------Case for rule with no conditions ----------------------------------------------------------------------*/

qlf_rewrite(X,X,[id,X]):rwdbug('no match found, using id':X). /*--------------------------------------------------------------------Catch-all case to ensure we never fail. ----------------------------------------------------------------------*/ /*--------------------------------------------------------------------qlf_rewrite_call(+ListOfConditions,-Trace) Just goes down the list checking the conditions. Recursive calls of the rewriter will build up the trace. ----------------------------------------------------------------------*/

qlf_rewrite_call([QLF ==> CaseFrame|Rest],[Trace|RestTrace]) :rwdbug('trying to rewrite':QLF), qlf_rewrite(QLF,CaseFrame,Trace), !, qlf_rewrite_call(Rest,RestTrace). /*--------------------------------------------------------------------... condition triggers attempt at recursive rewriting ----------------------------------------------------------------------*/ qlf_rewrite_call([Cond|Rest],Trace) :rwdbug('trying condition':Cond), call(Cond), qlf_rewrite_call(Rest,Trace). /*--------------------------------------------------------------------... some other Prolog test

59

----------------------------------------------------------------------*/

qlf_rewrite_call([],[]). /*--------------------------------------------------------------------rwdbug(+Msg) prints the message if `rwdebugging' is true. ----------------------------------------------------------------------*/ rwdbug(Msg) :rwdebugging, write(Msg), nl, nl, !. rwdbug(_).

:- dynamic rwdebugging/0. /*--------------------------------------------------------------------rwdbug/0 switches debugging on or off by asserting `rwdebugging' if it is false, retracting it if it is true ----------------------------------------------------------------------*/

rwdbug :rwdebugging, retract(rwdebugging), !. rwdbug :assert(rwdebugging).

60

APPENDIX D TOP LEVEL COMMANDS FOR PERFORMING ENGLISH TO ENGLISH CASE FRAME MAPPING % Top level commands for English to English case frame. % David Milward 24.7.96 adapting programs of Steve Pulman. /*----------------------------------------------------------------------Predicate: initialise_cf_trans Succeeds: once Comments: initialises system for translation into case frames by compiling the qlf annotation rules, and loading an appropriate lexicon -------------------------------------------------------------------------*/ initialise_cf_trans :compile_and_load_qlf_annotation_rules, set_switch(store_qlf_trees_for_printing), % FINAL VERSION WILL LOAD LEXICON AND REWRITE RULES AUTOMATICALLY: % LINES BELOW ARE FOR DURING DEVELOPMENT consult('/home/users/cle/camcf/man/manqlfrwrules.pl'), s(".lf /home/users/cle/camcf/man/manextraleximp.pl"), s(".ici"). /*----------------------------------------------------------------------Predicate: atom_to_case_frame(+Atom,-CaseFrame) Args: Atom represents a sentence to be analysed e.g. 'Show me the files' Succeeds: Comments: Creates a case frame representation for the sentence. -------------------------------------------------------------------------*/ atom_to_case_frame(Atom,CaseFrame) :atom_to_annotated_qlf(Atom,QLF), qlf_rewrite(QLF,CaseFrame,_Trace). /*--------------------------------------------------------------------This includes the QLF annotation in the CLE top loop: ----------------------------------------------------------------------*/ process_chosen_qlf_applic(QLF) :qlf_words_tree_for_printing(QLF,_Words,Tree), sgp_add_cats_to_items_in_tree(Tree,_Pair,[_Rule/(QLF,_)|_]), pp_underscore(QLF), qlf_rewrite(QLF,CaseFrame,Trace), pp_underscore(CaseFrame), pp_underscore(Trace).

61

APPENDIX E EXAMPLE OUTPUTS OF THE QLF-TO-CF MODULE The sentence Type the le on the computer produces two di erent parses and CLE cites them as : 2 well-sorted semantic analyses. Preference ranking. Complete sentence with bracketing: "{type {the file {on {the computer}}}}" Word senses (unordered):

Confirm this analysis? (y/n/c/p/?): n Complete sentence with bracketing: "{type {the file} {on {the computer}}}" Word senses (unordered):

Confirm this analysis? (y/n/c/p/?): y

In this example we choose the second parse which produces the following QLF: [imp, form(l([type,the,file,on,the,computer]),verb(no,no,no,imp,y),A, B^ [B, form(l([on,the,computer]),prep(on),_, C^

62

[C,v(A), term(l([the,computer]), ref(def,the,sing,l([D-_,E-F])),_, G^[computer_ComputingMachine,G],_, np:[focus=not_tpc,mass=n])], _), [type_WriteMechanically,A, term(_,ref(pro,you,_,l([])),E,H^[personal,H],_,_), term(l([the,file]),ref(def,the,sing,l([E-F])),D, I^[file_PieceOfCode,I],_, np:[focus=not_tpc,mass=n])]], s:[inv=n,whmoved=n])]

The QLF-toCF module produces the following Case Frame structure:

[mood:imperative,voice:active, verb: [root:type_WriteMechanically,tense:no,perfect:no, progressive:no,negation:positive,modality:imp, fin/inf:untensed], adjuncts: [pp: [pform:on, arguments: [arg1=v(_), [class:definite_noun,number:singular,determiner:the, mass:n,focus:not_tpc,var:_, nominal:computer_ComputingMachine]]]], arguments: [arg1:[class:pronoun,nominal:dummy_imp_subj], arg2: [class:definite_noun,number:singular,determiner:the, mass:n,focus:not_tpc,var:_,nominal:file_PieceOfCode]]]

or in graphical form (using the LaTEX Avm style): 63

2

6mood: 6 6 6voice: 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6verb: 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6adjuncts: 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6arguments: 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4

3

imperative active

2

6root: 6 6 6tense: 6 6 6 6perfect: 6 6 6 6progressive: 6 6 6 6negation: 6 6 6 6modality: 6 6 4

n/inf:

2

6 6 6 6 6 6 6 6 6 6 6 6 6 6 6pp: 6 6 6 6 6 6 6 6 6 6 6 6 6 4 2 6 6arg1: 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6arg2: 6 6 6 6 6 6 6 6 6 6 4

2

3

type WriteMechanically77 7 7 no 7 7 7 7 no 7 7 7 7 no 7 7 7 7 positive 7 7 7 7 imp 7 7 5 untensed

6pform: 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6arguments: 6 6 6 6 6 6 6 6 6 6 4

on

2

6 6 6 6 6 6 6 6 6 6 6 6arg1: 6 6 6 6 6 6 6 6 6 6 4

2

6class: 6 6 6number: 6 6 6 6determiner: 6 6 6 6mass: 6 6 6 6focus: 6 6 6 6var: 6 6 4

nominal:

3 3

2

pronoun 7 7 5 nominal: dummy imp subj

6class: 6 4 2

6class: 6 6 6number: 6 6 6 6determiner: 6 6 6 6mass: 6 6 6 6focus: 6 6 6 6var: 6 6 4

nominal:

de nite noun singular the n not tpc le PieceOfCode

64

7 7 7 7 7 7 37 7 77 77 77 77 77 77 77 77 77 77 77 77 77 77 77 77 77 77 77 77 77 77 7 55

7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 3 37 7 7 7 77 7 7 337 7 77 7 777 7 de nite noun7777777777 77 777 77 777 singular 77 777 77 777 777 77 77 7 7 the 77 777 77 777 7 7 77777 7 7 7 7 7 n 77777 77777 77777 77777 not tpc 77777 77777 77777 77777 77777 77777 55577 57 computer 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5

The following example demonstrates the phases after QLF-to-English-CF transfer too and, in the end, gives the Turkish translation of the original English sentence. English Sentence Input to CLE:

This program produces a large le. CLE produces the following unresolved quasi-logical form:

dcl, form(l([this,program,produces,a~,large,file]), verb(pres,no,no,no,y), A, B^ [B, [produce_Make,A, term(l([this,program]),ref(dem,this,sing,l([])),_, C^[program_ComputerInstructions,C],_, np:[focus=subj,mass=n]), term(l([a~,large,file]),q(not_tpc,a,sing),_, D^[and,[file_PieceOfCode,D],[large_Sizeable,D]],_, np:[focus=not_tpc,mass=n])]], s:[inv=n,whmoved=n])]

This QLF is mapped by our QLF-to-English CF mapper, to the following English CF representation: 65

2

6mood: 6 6 6voice: 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6verb: 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6arguments: 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4

3

declarative active

7 7 7 7 7 7 2 3 7 7 root: produce Make 7 6 7 7 6 7 7 6 7 7 6tense: 7 pres 7 6 7 7 6 7 7 6 7 7 6perfect: 7 no 7 6 7 7 6 7 7 6 7 7 6progressive: no 7 7 6 7 7 6 7 7 6 7 7 6negation: 7 positive 7 6 7 7 6 7 7 6 7 7 6modality: 7 no 7 6 7 7 6 7 7 4 5 7 n/inf: nite 7 7 2 2 337 7 6 77 class: demonstrative noun 7 6 7 6 6 77 6 77 7 6 7 6 6number: 77 6 77 singular 7 6 7 6 6 77 6 77 7 6 7 6 6determiner: this 77 6 77 7 6 7 6 6 77 6 77 7 6 7 6subj: 6 77 mass: n 6 77 7 6 7 6 6 77 6 77 7 6 7 6 6focus: 77 subj 6 77 7 6 7 6 6 77 6 77 7 6 7 6 6var: 77 5354 6 77 7 6 7 6 6 77 6 77 4 5 6 77 nominal: program ComputerInstructions 6 77 6 77 2 3 6 77 6 77 6 77 class:quanti ed noun 6 7 6 77 6 7 6 77 6 7 6 77 6number: 7 6 77 singular 6 7 6 77 6 7 6 77 6 7 6 77 6determiner: 7 6 77 a 6 7 6 77 6 7 6 77 6 7 6 77 6mass: 7 6 77 n 6 7 6 77 7 6obj: 6 77 6 7 6 77 6focus: 7 6 77 tpc not 6 7 6 77 6 7 6 77 6 7 6 77 6var: 7 6 77 5326 6 7 6 77 6 7 6 77   6 7 6 77 6 7 6 77 premodi ers: 6 7 large Sizeable 6 77 6 7 6 77 6 7 6 77 7 4 5 4 55

le PieceOfCode

nominal:

66

This English case-frame representation is transferred to the following Turkish case frame using the transfer module: ((s-form finite) (clause-type predicative) (speech-act declarative) (voice ((active +))) (verb ((root "yarat") (tense present) (aspect aorist) (sense positive))) (arguments ((subject ((specifier ((demonstrative bu))) (referent ((arg ((concept "program"))) (agr ((number singular) (person 3))))))) (dir-obj ((specifier ((determiner ((definite -) (referential +))))) (modifier ((qualitative ((p-name bUyUk))))) (referent ((arg ((concept "kUtUk"))) (agr ((number singular) (person 3))))))))))

67

which is then processed by the Turkish generator to generate the intermediate form: [[CAT=ADJ][ROOT=bu][TYPE=DETERMINER] [[CAT=NOUN][ROOT=program][AGR=3SG][POSS=NONE][CASE=NOM] [[CAT=ADJ][ROOT=bir][TYPE=DETERMINER]] [[CAT=ADJ][ROOT=bUyUk]] [[CAT=NOUN][ROOT=kUtUk][AGR=3SG][POSS=NONE][CASE=NOM]] [[CAT=VERB][ROOT=yarat][SENSE=POS][TAM1=AORIST][AGR=3SG]] [PERIOD]

which gets realized by the morphological generator as Turkish surface form:

Bu program bir bUyUk kUtUk yaratIr.

68

APPENDIX F TURKISH AND ENGLISH CASE FRAME STRUCTURES F.1 Turkish Case Frame Structure 2

S-FORM 6CLAUSE-TYPE 6VOICE 6 6SPEECH-ACT 6 6 6QUES 6 6 6 6 6 6 6VERB 6 6 6 6 6 6 6 6 6 6 6 6 6ARGS 6 6 6 6 6 6 6 6 6 6 6 6ADJN 6 6 6 6 6 6 6 4CONTROL

3 in nitive/adverbial/paticiple/ nite existential/attributive/predicative 7 7 active/re exive/reciprocal/passive/causative 7 7 imperative/optative/necessitative/wish/interrogative/declerative 7   7 TYPE yes-no/wh 7 7 CONST list-of(subject/dir-obj/etc.) 7 3 2 7 7 ROOT verb 7 7 6POLARITY negative/positive 7 7 6TENSE 7 present/past/future 7 6 7 4ASPECT 7 progressive/habitual/etc. 5 7 7 MODALITY potentiality 7 3 2 7 SUBJECT c-name 7 7 DIR-OBJ c-name 7 6 7 7 6SOURCE 7 c-name 7 6 7 7 6GOAL 7 c-name 7 6 7 7 6LOCATION c-name 7 7 6 7 6BENEFICIARY c-name7 7 4INSTRUMENT c-name5 7 7 7 VALUE c-name 7 2 3 7 TIME c-name 7 7 6PLACE c-name7 7 6MANNER c-name7 7 6 7 7 4PATH 7 c-name5 7 7 DURATION c-name 7 2 3 7 TOPIC constituent 7 5 4FOCUS constituent5 BACKGR constituent

69

F.2 English Case Frame Structure 2

3

mood: 6 6voice: 6 6 6adverbial:

7 7 7 7 7 6 7 2 3 6 7 6 root: 7 6 7 6 7 6 7 6 7 tense: 6 7 6 7 6 7 6 7 6 7 6perfect: 7 6 7 6 7 6verb: 6progressive:7 7 6 7 6 7 6 7 6 7 6 7 6 7 negation: 6 7 6 7 6 7 6 7 6 7 modality: 4 5 6 7 6 7 n/inf: 6 7 6 2 2 337 6 7 6 7 pform: 6 7 2 3 6 6 7 7 2 3 6 6 6 777 6 7 6 6 7 7 6 6 6 6 777 377 62 6 6 7 6 6 7 7 6 class: 77777 6 6 6 6 66 77777 6 7 6 6 6 7 6 6 6 7777 6 6 77 66 7 number: 6 6 7 6 6 7 7 77 777 66 6 7 6 6 66determiner:777 6adjuncts/manner/path/duration: 6 pp: 6 7777 6 6 6 7 6 7 6 6 6 6arguments: 6arg1/arg2/..:66 77 777 6 77 6 6 77 7 6 7 6 7 6 mass: 6 6 6 777 777 66 6 7 6 7 6 6 7 77777 66 6 7 6 6 6 77777 66focus: 6 7 6 6 6 7 6 7 6 7 6 6 7 6 6 77 7777 66var: 6 6 6 6 7 7 555557 44 6 7 4 4 4 6 7 nominal: 6 7 6 7 6 7 2 2 33 6 7 6 7 class: 6 7 6 7 6 7 6 7 6number: 77 6 6 7 6 7 6 7 6 7 6 6 77 6 7 determiner: 6 7 6 7 6 7 6 6 77 6 7 6 7 6 7 6 7 mass: 6 6 77 6 7 6 7 6 7 6 7 6focus: 6 77 6 7 6 7 6 7 6arguments: 6subj/obj/ind obj: 6 7 7 7 6 7 6var: 6 77 6 7 6 6 77 6 7 6genitive: 77 6 6 7 6 6 77 6 7 6 6 77 6 7 6gerund: 77 6 6 7 6 6 77 6 7 6time: 6 77 6 7 4 4 55 4 5

nominal:

70

Suggest Documents