Creating high-quality, large-scale bilingual ... - Semantic Scholar

Creating high-quality, large-scale bilingual knowledge bases using minimal resources Davide Turcato, Fred Popowich, Paul McFetridge, Janine Toole gavagai Technology Inc. P.O. 374, 3495 Cambie Street, Vancouver, British Columbia, V5Z 4R3, Canada and Natural Language Laboratory, School of Computing Science, Simon Fraser University 8888 University Drive, Burnaby, British Columbia, V5A 1S6, Canada turk,popowich,mcfet,toole @cs.sfu.ca Abstract We describe a complete production cycle for the semi-automatic development of bilingual knowledge bases. The methodology is based on the optional availability of a source language monolingual corpus and on employing lexicographers who only need to be bilingual and computer literate. Bilingual resources (corpora, Machine Readable Dictionaries, etc.) are not required, and the employment of skilled and trained personnel is minimized, thus making the approach suitable for minority languages. The methodology is language independent and has been extensively used in generating an English-Spanish bilingual lexicon for Machine Translation.

1.

Overview

We describe a complete production cycle for the semiautomatic development of bilingual knowledge bases. We use the term bilingual knowledge bases to refer to rich sources of bilingual knowledge, not just in terms of word and phrase equivalences, but also in terms of morphological and syntactic information, as well as functional dependencies (e.g. verb-complement relations) and their crosslinguistic mapping. In the spirit of (Sadler and Vendelmans, 1990), a bilingual knowledge base is, to some extent, unspecified for its application domain, and can be used for different purposes: Machine Translation (MT), Cross Linguistic Information Retrieval (CLIR), etc. We define bilingual knowledge bases as collections of Lexical Transfer Rules (LTRs), i.e. bilingual rules that only contain lexical items (and associated descriptions) on either side. All functional dependencies are expressed in terms of relations among lexical items, by way of indices. We argued in (Turcato et al., 1999b) that a collection of this kind can be interpreted in different ways (e.g. either side of a bilingual entry can be interpreted as a bag or a sequence), thus being able to be used (in MT) under different theoretical approaches and system architectures. Our bilingual development methodology rests on two assumptions: (i) monolingual corpora are relatively easy to obtain, while bilingual corpora are not (the same holds, to some degree, for monolingual lexicons vs. bilingual lexicons); (ii) the use of highly-skilled, well-trained personnel for long periods is, in most cases, beyond the capabilities of a minority language project. Hence, we propose an approach to semi-automatic development that only relies on the (optional) availability of a monolingual corpus, and when intervention of a human developer is required, the developer only needs to be bilingual and computer literate. The described methodology has been used for the creation of a bilingual lexicon for an English-Spanish Machine Translation system (Popowich et al., forthcoming). All the examples and results in the rest of the paper refer to that implementation. However, the methodology is language-

independent, as will become apparent in the course of its description.

2.

Pre-requisites

We take a bootstrap approach to bilingual development, i.e. an initial set of LTRs is used to draw evidence for the automatic generation of further LTRs from bilingual pairs. Such an initial knowledge base is the only resource that needs to be manually developed and needs trained personnel with mastery of the formalism in use. However, in our experience, such a knowledge base can be very small. What really counts is its coverage in terms of different parts of speech and phrasal constructions, rather than its size in terms of number of entries. A bootstrap bilingual knowledge base is used to extract bilingual templates, which in turn are used in generating new LTRs from simple bilingual pairs. A bilingual template is a skeletal LTR, unspecified for words. For a detailed discussion of bilingual templates, how they are generated and how they are used, we refer the reader to (Turcato, 1998) and (Turcato et al., 1999a). We only mention here that bilingual templates can be derived in different ways from a bootstrap bilingual knowledge base. A simpler way (which we call the enumerative approach) consists of removing words from LTRs, normalizing variables by renaming them in some canonical way (so as to avoid two instances of the same templates to only differ by variable names), then ranking templates by frequency, if the application of some cutoff is in order. This approach practically requires no labor and yields acceptable results in terms of LTR generation. A more sophisticated approach (which we call the generative approach) consists of developing a recursive grammar for generating templates, thus yielding an infinite number of templates. This approach is more powerful and gives better coverage in generating LTRs, but requires some expert labor in developing a template grammar. A bootstrap knowledge base is the only pre-existing bilingual resource required by the present methodology. As for monolingual resources, lexicons for both languages in-

volved are needed in the LTR generation process. Finally, the optional source language collocation extraction phase requires a tagged monolingual corpus as its input.

3.

Production cycle

The semi-automatic bilingual knowledge base production cycle consists of four phases. We illustrate the production cycle by means of a worked example. We show how bilingual entries for the phrasal pattern go+adjective (e.g. go crazy) are generated. 3.1. Acquisition of source language expressions The first phase aims at identifying the source language (SL) words and phrases that need to be included in a bilingual knowledge base. This task is driven by two guidelines: 1. Since translation is non-monotonic (i.e. a word can get different translations in increasingly larger contexts), it is necessary to identify multi-word expressions that do not translate compositionally, in order to create specific bilingual entries for them. 2. Since completeness is hardly achievable in a bilingual knowledge base, it is desirable to prioritize the most useful entries (i.e. entries that are likely to be used most often), in order to maximize the bilingual knowledge base coverage. We developed a technique for automatic extraction of most useful words and collocations to be translated, from a monolingual corpus (McDonald et al., 2000). The technique is based on statistical methods, but instead of being based on the standard notion of frequency of words and phrases, it is based on a notion of adjusted frequency, which takes into account the dispersion of words in a corpus. Words with a higher dispersion throughout the different texts comprising a corpus are assigned a higher adjusted frequency than words of equal frequency but lower dispersion. The rationale behind this approach is to prioritize items that are spread more evenly throughout a corpus over those concentrated in a small number of texts. This tool is used for extracting both single words and collocations to be coded (although, for brevity, we refer to it as a ‘collocation extraction’ tool). The notion of adjusted frequency is used both in ranking single words and in extracting and ranking collocations. The extraction task is accomplished by using a modified version of frequency-based standard techniques based on log-likelihood, with frequencies replaced by adjusted frequencies. We incidentally note that the notion of collocation is defined on a monolingual base, and does not coincide with the notion of ‘phrase that cannot translate compositionally’, which is what we are after. The latter notion is inherently bilingual. However, from an empirical point of view, the two classes appear to have a significant intersection. The input to word and collocation extraction is a tagged corpus. We also take advantage of the existence of a bootstrap bilingual knowledge base for preliminary extraction of the most frequent lexical sequences found on the source half of the knowledge base (e.g. verb-adjective, adjectivenoun, noun-noun). We assume such sequences represent

the most common collocation classes, and use them to drive the collocation extraction process. Both contiguous and non-contiguous collocations can be extracted. So, given a tagged monolingual corpus and a phrasal pattern we are interested in (e.g. intransitive-verb + adjective), this production phase outputs a list of source language collocations belonging to that phrasal pattern and deemed to be useful for bilingual coding. The output list for the subset involving the verb go is shown in Figure (1) below. 541.18 412.50 227.41 150.24 138.76 113.35 103.83 94.42 82.61 46.59 43.00 40.70 34.59 33.67 33.52 33.32 30.92 29.46 24.23 22.19 21.07 18.86

go go go go go go go go go go go go go go go go go go go go go go

V V V V V V V V V V V V V V V V V V V V V V

ADJ ADJ ADJ ADJ ADJ ADJ ADJ ADJ ADJ ADJ ADJ ADJ ADJ ADJ ADJ ADJ ADJ ADJ ADJ ADJ ADJ ADJ

crazy ADJECTIVE wrong ADJECTIVE faster ADJECTIVE mad ADJECTIVE nuts ADJECTIVE great ADJECTIVE berserk ADJECTIVE ballistic ADJECTIVE bad ADJECTIVE run ADJECTIVE fabulous ADJECTIVE undercover ADJECTIVE free ADJECTIVE right ADJECTIVE insane ADJECTIVE sour ADJECTIVE flying ADJECTIVE straight ADJECTIVE critical ADJECTIVE off ADJECTIVE blind ADJECTIVE soft ADJECTIVE

Figure 1: Extracted collocations for go+adjective The number in the first column is a usefulness index we compute and use to rank collocations. For this specific pattern (go+adjective), 220 collocations were initially found in the corpus. The final list of 22 items was obtained by applying thresholds on: (i) the adjusted frequencies of single words appearing in collocations; (ii) the usefulness index shown above. The collocation extraction procedure is entirely automatic. However, if no monolingual corpora or taggers are available, it can be replaced by other techniques, ranging from entirely automatic to entirely manual, depending on the available resources. For instance, in an earlier phase of our project, we used a hybrid approach, by first automatically producing single word frequency lists from a tagged corpus, then having lexicographers manually look up multiword phrases in a paper dictionary, for the most frequent words. How this first phase of the production cycle is performed is immaterial to the rest of the cycle, as long as lists of plain source language words and phrases are produced. 3.2. Coding of bilingual pairs In this phase the selected words and collocations are translated by personnel with bilingual competence, to obtain bilingual pairs. The personnel used in this phase need no knowledge of any formalism and no linguistic training,

as they are only concerned with translating plain text. The list in Figure (2) below (obtained by simply reformatting and sorting the output of the previous phase) exemplifies the input lexicographers receive in this phase. The tags in parentheses are optional. They are meant as guides, or suggestions, to help lexicographers interpret the input, but the input could just be plain words. In our case, we only retained lexical tags for verbs from the previous phase, because we found them useful in clarifying the sense in which a verb is used (e.g. a lexicographer presented with the verb run in isolation, would not know whether s/he is to translate the verb in its transitive or intransitive sense). go go go go go go go go go go go go go go go go go go go go go go

(V (V (V (V (V (V (V (V (V (V (V (V (V (V (V (V (V (V (V (V (V (V

ADJ) ADJ) ADJ) ADJ) ADJ) ADJ) ADJ) ADJ) ADJ) ADJ) ADJ) ADJ) ADJ) ADJ) ADJ) ADJ) ADJ) ADJ) ADJ) ADJ) ADJ) ADJ)

bad = ballistic = berserk = blind = crazy = critical = fabulous = faster = flying = free = great = insane = mad = nuts = off = right = run = soft = sour = straight = undercover = wrong =

Figure 2: Bilingual coding input for go+adjective The list in Figure (3) below shows the output of this phase, after lexicographers have added translations to their input list. As the list shows, lexicographers are asked to add translations in a natural, dictionary-like manner. For instance, inflected forms (e.g. los estribos, plural noun; corriendo, gerund verb) and enclitic particles (e.g. echarse) are used. As already mentioned, in this phase lexicographers need to have no knowledge whatsoever of any formalism (except, possibly, understanding the meaning of lexical tags being used), yet they have considerable freedom in deciding what their final output looks like. For instance, they can remove a lexical tag when they think it is inappropriate (e.g. go flying), inflect a form (e.g. go running), remove an entry when they think it is not a meaningful phrase or when it translates compositionally (e.g. go fabulous), multiply an entry when it has more than one translation (e.g. go off ). This is all possible because the bilingual knowledge lexicographers are coding is purely based on their linguistic intuition, and is independent of any issue of knowledge representation, coding convention, or any other technicalities. Likewise, lexicographers do not need acquaintance

go (V ADJ) bad = echarse a perder. go (V ADJ) ballistic = perder los estribos. go (V ADJ) berserk = enloquecer. go (V ADJ) blind = quedarse ciego. go (V ADJ) crazy = volverse loco. go (V ADJ) faster = acelerarse. go flying = ir a volar. go (V ADJ) free = ser libre. go (V ADJ) great = ir bien. go (V ADJ) insane = volverse loco. go (V ADJ) mad = enloquecer. go (V ADJ) nuts = volverse loco. go (V ADJ) off = estallar. go (V ADJ) off = sonar. go (V ADJ) off = disparar. go (V ADJ) off = irse. go (V ADJ) right = girar a la derecha. go running = irse corriendo. go (V ADJ) soft = ablandarse. go (V ADJ) sour = agriarse. go (V ADJ) straight = seguir derecho. go (V ADJ) undercover = empezar a trabajar en secreto. go (V ADJ) wrong = fallar. Figure 3: Bilingual coding output for go+adjective

with any specific software for doing their job. Familiarity with any text editor is all that is required of them in terms of computing skills. We incidentally note that such characteristics make this coding phase very suitable to be done remotely. Finally, note that only 2 entries out of 22 were removed from this input sample. All the other 20 were assigned a non-compositional translation, thus showing that the collocation extraction phase successfully achieved this part of its goal. 3.3. Generation of LTRs In this phase, complete LTRs are automatically generated from input bilingual pairs. The generation technique is based on bilingual templates extracted from the bootstrap knowledge base. Recall that templates are simply skeletal LTRs, unspecified for words. Given a bilingual pair, the generation algorithm comprises three steps: 1. Lookup all words (on both the source and target sides) in monolingual lexicons. 2. Select an LTR template that matches all the lexical categories assigned to input words. 3. On a successful match, instantiate the selected template with the input words and output the resulting LTR. The output of our generation procedure for the bilingual pair go bad = echarse a perder (the first item in our input list) is shown in (1) below.

(1)

% DISJUNCTION: comment out % unwanted entries. % go::(L,@v adj(A,B,C)) &bad::(LM,@adj(C)) echar::(R,@v acc(A,B,D)) &refl::@refl pron(D) &a::@prep(A,E) &perder::(@v null(E, ),@inf) trans verb(L,R) trans modifier(LM,R). % % OR % go::(L,@v adj(A,B,C)) &bad::(LM,@adj(C)) echar::(R,@v acc dat(A,B,D,D)) &refl::@refl pron(D) &perder::(@v null(E, ),@inf) personal a(D,E) trans verb(L,R) trans modifier(LM,R).

The output of this phase is already a set of complete LTRs, in their final format. A sample output file for the first ten bilingual pairs of our input list is also shown in Appendix 1. For the reader’s convenience, we provide here some explanations about the specific LTR formalism we use in our system. The main body of an LTR comprises a left hand side and right hand side connected by a double arrow (‘ ’) operator. Each side is comprised of zero or more lexical items, connected by the ampersand (‘&’) operator (an empty list ‘[]’ would represent the special case of zero lexical items). In each lexical items, a double colon (‘::’) operator connects a word to its description. Descriptions are mainly expressed by macros, introduced by a ‘@’ operator. The macro arguments are indices, as used in lexicalist transfer. The main body of an LTR can have transfer macros attached to it, introduced by a double backslash (‘ ’) operator. A transfer macro takes two descriptions as arguments and performs some additional transfer, besides what expressed in the LTR body. For more details about the internal structure of LTRs we refer the reader to (Turcato et al., 1997) and references therein. Finally, we note that lines introduced by a percent sign (‘%’) are comments. The generation procedure tries to generate all possible LTRs for an input pair. Therefore, as in the case above, more than one entry can be output for an input pair. This situation is signaled by appropriate comment lines that lexicographers will use as a guide in the validation phase. Other kinds of warnings are also output to call the lexicographers’ attention on particular situations (e.g. for some classes of lexically ambiguous words, or when lexically unknown words are encountered, thus increasing the indeterminism of generation). In this phase, the lexical tags present in the input are used as constraints on generation. A word with an associated tag can only be assigned a lexical category matching

that tag. This is apparent from our output sample: the only LTR in which the English main verb is assigned categories different from v adj is that for go flying, where the verb lexical tag had been removed by the lexicographer. Along with an output file of candidate LTRs, a remainder file is produced, including all the bilingual pairs for which no LTR could be generated. The inspection of such file can be useful in identifying gaps in the lexical template inventory, and possibly fix such inadequacies. It can also be used as input to a supplementary manual coding activity, if such option is available. 3.4.

Validation of LTRs

In the final phase, the output of automatic generation is submitted to lexicographers (possibly different from those employed in the second phase), who revise it in order to remove unwanted entries. This task mostly consists of choosing one LTR (or more) when a disjunction was generated, and commenting out or deleting the others. It also consists of checking LTRs for which some warning was issued, and possibly removing such LTRs. In this phase only a passive knowledge of the formalism in use is required (in our specific case, this amounts to knowing the formalism syntax earlier illustrated, the intuitive meaning of macros and the way dependencies are expressed by indices). No active coding is performed at this stage. In principle, it would be conceivable to have lexicographers actively correct LTRs at this stage. However, in our project we discarded this option, both because this would have involved a higher level of training for lexicographers employed in this phase, and because this would have generated potential for syntactic inconsistencies in the final output, thus requiring a further consistency check phase. Instead, we decided to strictly limit lexicographers’ intervention to LTR removal, at the cost of sometimes removing nearly correct LTRs. The final LTR validated for go bad = echarse a perder is shown in (2) below. (2)

go::(L,@v adj(A,B,C)) &bad::(LM,@adj(C)) echar::(R,@v acc(A,B,D)) &refl::@refl pron(D) &a::@prep(A,E) &perder::(@v null(E, ),@inf) trans verb(L,R) trans modifier(LM,R).

4.

Results

We provide here some data about the system performance. Most of the following data have already been presented, in slightly different form, in (Turcato et al., 1999a) and (McDonald et al., 2000). Table (1) below gives a rough idea of the success rate of each production step, by showing the number of items (respectively: source language phrases, bilingual pairs, LTRs and LTRs) output by each of the four step for a specific phrase pattern (intransitive-verb + adjective, the only one for which we currently have data available for all production cycles).

Phase SL expressions acquired Bilingual pairs coded Generated LTRs Validated LTRs

N. of items 237 199 346 187

Table 1: Success rate at each production step Note that 10 of the 237 acquired SL expressions were filtered out before the encoding phase, because they were found to have already been coded in the existing bilingual lexicon. Also, the figure in the 4th row (validated LTRs) represents both the absolute number of validated LTRs and the number of bilingual pairs for which at least one LTR was validated (this means that each bilingual pair was associated with one LTR at most). Table (2) shows some more results about the performance of the third phase (LTR generation). Namely, it shows the relation between the number of bilingual pairs input to generation (In) and the number of output LTR that get validated in the fourth phase (Val), for different files concerning a number of different lexical patterns. We also show the number of bilingual pairs for which at least one LTR was validated (InVal). Note that the file in the first two rows were processed using what we called the enumerative approach to template generation; for the following files we used the generative approach. File ADJ Phrasal verbs/Batch 1 Phrasal verbs/Batch 2 Phrasal verbs/Batch 3 V + (ADJ or N) ADJ + N IV + ADJ

In 542 2340 549 478 345 1144 199

Val 469 1416 486 404 300 914 187

InVal 468 1414 469 393 292 903 187

Table 2: LTR automatic generation results Finally, table (3) illustrates the average time effort required for the lexicographic work involved in our production cycle, and compares it with the time effort of manually coding LTRs. Each row refers to the processing of 100 items (LTRs for the 1st and 3rd rows, bilingual pairs for the 2nd row). Note that the results refer to the work done by a single lexicographer on all three tasks. Therefore, we believe the figures are more significant for the purposes of comparing different tasks than for assessing the absolute times required for each activity. Activity Manually coding LTRs Coding translation pairs Validating LTRs

Time (hrs) 16.00 3.12 1.59

Table 3: Time effort for lexicographic work

5.

Conclusion

Minority languages are often addressed in MT in terms of rapid development of prototypes (Palmer et al., 1998), (Jones and Havrilla, 1998), where the expression often translates into rapid development of minor systems, with low quality and little scalability. We believe that the proposed method is suitable for minority languages in that it requires minimal resources, yet it aims at developing highquality, large-scale knowledge bases, which are needed as much for minority languages as for widely spoken languages. The dependence on specific formalisms or application domains is confined to the bootstrap knowledge base development (and, to a limited extent, to a passive knowledge of the formalism in the validation phase). The output format directly reflects the bootstrap knowledge base format and is transparent to the production process.

6.

Acknowledgments

The work on collocation extraction described in section 3.1. was done in collaboration with Scott McDonald, Edinburgh University.

7. References Jones, Douglas and Rick Havrilla, 1998. Twisted pair grammar: Support for rapid development of Machine Translation for low density languages. In Proceedings of the Third Conference of the Association for Machine Translation in the Americas (AMTA-98). Langhorne, Pennsylvania, USA. McDonald, Scott, Davide Turcato, Paul McFetridge, Fred Popowich, and Janine Toole, 2000. Collocation discovery for optimal bilingual lexicon development. In Advances in Artificial Intelligence — 13th Biennial Conference of the Canadian Society for Computational Studies of Intelligence, AI’2000, Montréal, Québec, Canada, 14–17 May 2000. Palmer, Martha, Owen Rambow, and Alexis Nasr, 1998. Rapid prototyping of domain-specific Machine Translation systems. In Proceedings of the Third Conference of the Association for Machine Translation in the Americas (AMTA-98). Langhorne, Pennsylvania, USA. Popowich, Fred, Paul McFetridge, Davide Turcato, and Janine Toole, forthcoming. Machine translation of closed captions. Machine Translation. Sadler, Victor and Ronald Vendelmans, 1990. Pilot implementation of a bilingual knowledge bank. In Proceedings of the 13th International Conference on Computational Linguistics (COLING-90). Helsinki, Finland. Turcato, Davide, 1998. Automatically creating bilingual lexicons for Machine Translation from bilingual text. In Proceedings of the 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics (COLING-ACL’98). Montréal, Québec, Canada. Turcato, Davide, Olivier Laurens, Paul McFetridge, and Fred Popowich, 1997. Inflectional information in transfer for lexicalist MT. In Proceedings of the International Conference ‘Recent Advances in Natural Language Processing’ (RANLP-97). Tzigov Chark, Bulgaria.

Turcato, Davide, Paul McFetridge, Fred Popowich, and Janine Toole, 1999a. A bootstrap approach to automatically generating lexical transfer rules. In Proceedings of Machine Translation Summit VII. Singapore. Turcato, Davide, Paul McFetridge, Fred Popowich, and Janine Toole, 1999b. A unified example-based and lexicalist approach to Machine Translation. In Proceedings of the 8th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-99). Chester, UK.

Appendix 1: sample LTR file for lexicographic validation % DISJUNCTION: comment out unwanted entries. % go::(L,@v adj(A,B,C)) &bad::(LM,@adj(C)) echar::(R,@v acc(A,B,D)) &refl::@refl pron(D) &a::@prep(A,E) &perder::(@v null(E, ),@inf) trans verb(L,R) trans modifier(LM,R). % OR go::(L,@v adj(A,B,C)) &bad::(LM,@adj(C)) echar::(R,@v acc dat(A,B,D,D)) &refl::@refl pron(D) &perder::(@v null(E, ),@inf) personal a(D,E) trans verb(L,R) trans modifier(LM,R). go::(L,@v adj(A,B,C)) &ballistic::(LM,@adj(C)) perder::(R,@v acc(A,B,C)) &defdet::@def art(C) &estribo::(RM,@noun(C),@plur) trans modifier(LM,RM) trans verb(L,R). go::(L,@v adj(A,B,C)) &berserk::(LM,@adj(C)) enloquecer::(R,@v null(A,B)) trans verb(L,R) trans modifier(LM,R).

% DISJUNCTION: comment out unwanted entries. % go::(L,@v adj(A,B,C)) &crazy::(LM,@adj(C)) volver::(R,@v acc dat(A,B,D,E)) &refl::@refl pron(E) &loco::@noun(D) trans verb(L,R) trans modifier(LM,R). % OR go::(L,@v adj(A,B,C)) &crazy::(LM,@adj(C)) volver::(R,@v ap acc(A,B,D,C)) &refl::@refl pron(D) &loco::(RM,@adj(C)) trans modifier(LM,RM) trans verb(L,R). go::(L,@v adj(A,B,C)) &faster::(LM,@adj(C)) acelerar::(R,@v acc(A,B,D)) &refl::@refl pron(D) trans verb(L,R) trans modifier(LM,R). % DISJUNCTION: comment out unwanted entries. % go::(L,@v adj(A,B,C)) &flying::(LM,@adj(C)) ir::(R,@v null(A,B)) &a::@prep(A,D) &volar::(@v null(D, ),@inf) trans verb(L,R) trans modifier(LM,R). % OR go::(L,@v adj(A,B,C)) &flying::(LM,@adj(C)) ir::(R,@v p phrase(A,B,D)) &a::@prep(D,E) &volar::(@v null(E, ),@inf) trans verb(L,R) trans modifier(LM,R). % OR go::(L,@v prespart(A,B,C)) &fly::(@v intr(C, ),@pres part) ir::(R,@v null(A,B)) &a::@prep(A,D) &volar::(@v null(D, ),@inf) trans verb(L,R). % OR go::(L,@v prespart(A,B,C)) &fly::(@v intr(C, ),@pres part) ir::(R,@v p phrase(A,B,D)) &a::@prep(D,E) &volar::(@v null(E, ),@inf) trans verb(L,R). % OR go::(L,@v obj adj(A,B,C,D)) &flying::@adj(D) ir::(R,@v acc(A,B,C)) &a::@prep(A,E) &volar::(@v null(E, ),@inf) trans verb(L,R).

% DISJUNCTION: comment out unwanted entries. % go::(L,@v adj(A,B,C)) &free::(LM,@adj(C)) ser::(R,@v a phrase(A,B,C)) &libre::@adj(C) trans verb(L,R) trans modifier(LM,R). % OR go::(L,@v adj(A,B,C)) &free::(LM,@adj(C)) ser::(R,@v null(A,B)) &libre::@adj(A) trans verb(L,R) trans modifier(LM,R). % DISJUNCTION: comment out unwanted entries. % go::(L,@v adj(A,B,C)) &great::(LM,@adj(C)) ir::(R,@verb null(A,B)) &bien::(@post v adv(A)) trans verb(L,R) trans modifier(LM,R). % OR go::(L,@v adj(A,B,C)) &great::(LM,@adj(C)) ir::(R,@verb null(A,B)) &bien:: @adj(A) trans verb(L,R) trans modifier(LM,R). % OR go::(L,@v adj(A,B,C)) &great::(LM,@adj(C)) ir::(R,@verb acc(A,B,C)) &bien::(RM,@noun(C)) trans modifier(LM,RM) trans verb(L,R). % DISJUNCTION: comment out unwanted entries. % go::(L,@v adj(A,B,C)) &insane::(LM,@adj(C)) volver::(R,@v acc dat(A,B,D,E)) &refl::@refl pron(E) &loco::@noun(D) trans verb(L,R) trans modifier(LM,R). % OR go::(L,@v adj(A,B,C)) &insane::(LM,@adj(C)) volver::(R,@v ap acc(A,B,D,C)) &refl::@refl pron(D) &loco::(RM,@adj(C)) trans modifier(LM,RM) trans verb(L,R). go::(L,@v adj(A,B,C)) &mad::(LM,@adj(C)) enloquecer::(R,@v null(A,B)) trans verb(L,R) trans modifier(LM,R).

Appendix 2: sample final LTR file

go::(L,@v adj(A,B,C)) &bad::(LM,@adj(C)) echar::(R,@v acc(A,B,D)) &refl::@refl pron(D) &a::@prep(A,E) &perder::(@v null(E, ),@inf) trans verb(L,R) trans modifier(LM,R). go::(L,@v adj(A,B,C)) &ballistic::(LM,@adj(C)) perder::(R,@v acc(A,B,C)) &defdet::@def art(C) &estribo::(RM,@noun(C),@plur) trans modifier(LM,RM) trans verb(L,R). go::(L,@v adj(A,B,C)) &berserk::(LM,@adj(C)) perder::(R,@v acc(A,B,C)) &defdet::@def art(C) &chaveta::(RM,@noun(C)) trans modifier(LM,RM) trans verb(L,R). go::(L,@v adj(A,B,C)) &crazy::(LM,@adj(C)) volver::(R,@v ap acc(A,B,D,C)) &refl::@refl pron(D) &loco::(RM,@adj(C)) trans modifier(LM,RM) trans verb(L,R). go::(L,@v adj(A,B,C)) &faster::(LM,@adj(C)) acelerar::(R,@v acc(A,B,D)) &refl::@refl pron(D) trans verb(L,R) trans modifier(LM,R). go::(L,@v prespart(A,B,C)) &fly::(@v intr(C, ),@pres part) ir::(R,@v null(A,B)) &a::@prep(A,D) &volar::(@v null(D, ),@inf) trans verb(L,R). go::(L,@v adj(A,B,C)) &free::(LM,@adj(C)) ser::(R,@v a phrase(A,B,C)) &libre::@adj(C) trans verb(L,R) trans modifier(LM,R). go::(L,@v adj(A,B,C)) &great::(LM,@adj(C)) ir::(R,@verb null(A,B)) &bien::(@post v adv(A)) trans verb(L,R) trans modifier(LM,R). go::(L,@v adj(A,B,C)) &insane::(LM,@adj(C)) volver::(R,@v ap acc(A,B,D,C)) &refl::@refl pron(D) &loco::(RM,@adj(C)) trans modifier(LM,RM) trans verb(L,R). go::(L,@v adj(A,B,C)) &mad::(LM,@adj(C)) enloquecer::(R,@v acc(A,B,D)) &refl::@refl pron(D) trans verb(L,R) trans modifier(LM,R).