Lang Resources & Evaluation (2011) 45:165–179 DOI 10.1007/s10579-010-9133-9 ORIGINAL PAPER
Recursive alignment block classification technique for word reordering in statistical machine translation Marta R. Costa-jussa` • Jose´ A. R. Fonollosa • Enric Monte
Published online: 26 November 2010 Springer Science+Business Media B.V. 2010
Abstract Statistical machine translation (SMT) is based on alignment models which learn from bilingual corpora the word correspondences between source and target language. These models are assumed to be capable of learning reorderings. However, the difference in word order between two languages is one of the most important sources of errors in SMT. In this paper, we show that SMT can take advantage of inductive learning in order to solve reordering problems. Given a word alignment, we identify those pairs of consecutive source blocks (sequences of words) whose translation is swapped, i.e. those blocks which, if swapped, generate a correct monotonic translation. Afterwards, we classify these pairs into groups, following recursively a co-occurrence block criterion, in order to infer reorderings. Inside the same group, we allow new internal combination in order to generalize the reorder to unseen pairs of blocks. Then, we identify the pairs of blocks in the source corpora (both training and test) which belong to the same group. We swap them and we use the modified source training corpora to realign and to build the final translation system. We have evaluated our reordering approach both in alignment and translation quality. In addition, we have used two state-of-the-art SMT systems: a Phrased-based and an Ngram-based. Experiments are reported on the EuroParl task, showing improvements almost over 1 point in the standard MT evaluation metrics (mWER and BLEU). M. R. Costa-jussa` (&) Barcelona Media Innovation Center, Av. Diagonal 177, 08018 Barcelona, Spain e-mail:
[email protected] J. A. R. Fonollosa E. Monte Universitat Polite`cnica de Catalunya, TALP Research Center, Jordi Girona 1-3, 08034 Barcelona, Spain J. A. R. Fonollosa e-mail:
[email protected] E. Monte e-mail:
[email protected]
123
166
M. R. Costa-jussa` et al.
Keywords Statistical machine translation Word reordering Statistical classification Automatic evaluation
1 Introduction The introduction of Statistical machine translation (SMT) has yielded significant improvements over the initial word-based translation (Brown et al. 1993). At the end of the last decade the use of context in the translation model (Phrase-based approach) represented a clear improvement in translation quality (Zens et al. 2004). In parallel to the Phrase-based approach, the use of a language model of bilingual units gives comparable results to the Phrase-based approach(Marin˜o et al. 2006). In both systems, the introduction of some reordering capabilities is of crucial importance for some language pairs (Costa-jussa´ and Fonollosa 2009). In our approach, we introduce order modifications to the source corpora by using alignment information so that word alignments and translation units become more monotonic together with a novel classification algorithm. The proposed classification algorithm parses alignments to infer reorderings. Unseen candidates to reordering are dealt by pairs of swapping blocks belonging to the same group. The groups have been created following recursively a co-occurrence block criterion. This paper is organized as follows. In Sect. 2, we describe the reordering process and the algorithm which infers the reorderings. In Sect. 3, we briefly describe the two baseline systems: Phrase-based and Ngram-based system, both capable of producing state-of-the-art SMT translations. Then, in Sect. 4 we set the evaluation framework and describe the experiments and results. Finally, in Sect. 5 we present the conclusions of this work.
2 Reordering based on alignment blocks classification (RABC) 2.1 Motivation SMT systems are trained by using a bilingual corpus composed of bilingual sentences. Each bilingual sentence is composed of a source sentence and a target sentence, and we align them at the word level by using GIZA?? (Och and Ney 2003). Generally, this alignment contains a certain amount of errors which deteriorates translation quality. One way of improving this alignment is by monotonization (Kanthak et al. 2005), i.e. reordering the words in the source sentence following the order of the words in the target sentence. For instance in a Spanish to English translation, the original sentence El discurso polı´tico fue largo would be modified as El polı´tico discurso fue largo. This would monotonize the alignment: El#The polı´tico#political discurso#speech fue#was largo#long. In (Popovic and Ney 2006), the authors design rules based on Part of Speech (POS) tags and reorder pairs of words both in the training and test sentences. Similarly, we propose one type of monotonization: pairs of consecutive blocks (sequence of words) which swap only if swapped generate a correct monotonic translation.
123
. .
167
.
.
. .
. . . oriental
. . .
y
. . . .
central
. . . . .
Europa
Europe Eastern and Central NULL
NULL
Recursive alignment block classification technique
Fig. 1 Example of an Alignment Block, i.e. a pair of consecutive blocks whose target translation is swapped
The main difference from (Popovic and Ney 2006) is that our approach learns the blocks which swap instead of following a pre-defined set of rules. Figure 1 shows an example of this type of pairs. The reordering based on blocks covers most cases as shown in (Tillmann and Zhang 2005). 2.2 Reordering process Our purpose is to model the effect of local block reordering to: (1) monotonize the alignment; and (2) be able to generalize in the test stage. In order to fulfil (1) and (2), the reordering process consists of the following steps (Costa-jussa` et al. 2008): •
•
• • •
Given a word alignment, we extract a List of Alignment Blocks (LAB). An Alignment Block consists of a pair of consecutive source blocks whose target translation is swapped. See Fig. 1. Given the LAB, we apply the Alignment Block Classification algorithm (see next Section), which allows us to decide whether two consecutive blocks have to be reordered or not. We use the criteria of the Alignment Block Classification to reorder the source corpora (including training, development and test sets). Given the modified source training corpus, we realign it with the original target training corpus. We build the systems as shown in Sect. 3, using the monotonized alignment and we translate the modified source test set.
2.3 Algorithm for alignment block classification The objective of this algorithm is to infer block reorderings in case the order of the blocks differs from source to target. The algorithm should be able to cope with swapping examples seen during training; it should infer properties that might allow reordering in pairs of blocks not seen together during training; and finally it should be robust with respect to training errors and ambiguities. The algorithm consists of two steps: in the first, given the LAB, the algorithm filters the ambiguous Alignment Blocks (i.e. either misaligned or incoherently aligned). We will define the filtered LAB as LABfilt, which will be a subset of LAB and consists of m pairs of blocks fða1 ; b1 Þ; ða2 ; b2 Þ; . . .; ðam ; bm Þg.
123
M. R. Costa-jussa` et al.
168
In the second step, from the LABfilt, we create the sets A ¼ fa1 ; a2 ; . . .; am g and B ¼ fb1 ; b2 ; . . .; bm g and the groups G1 . . .Gn . . .GN . A given group Gn is created following recursively a co-occurrence block criterion (see next Subsection) and has the form Gn ¼ fða1 ; b1 Þ; . . .ðap ; bp Þg where p is the cardinality of Gn, within each group Gn we create also the sets An ¼ fa1 ; . . .ap g and Bn ¼ fb1 ; . . .bp g. From each group Gn we build a Generalization group Ggn, where n ¼ 1; . . .N defined as the Cartesian product between the subsets An [ A and Bn [ B, i.e. Ggn ¼ An Bn . This group will allow to reorder cases such as ðar ; bs Þ with ar 2 An ; bs 2 Bn and ðar ; bs Þ 62 LABfilt . We can deal with possible inconsistencies by increasing the filtering threshold, and therefore limiting the number of allowed unseen pairs, and also by processing with morphological information. Note that we assume that only the elements in Ggn that appear in the training and test corpus set are correct generalizations (see Sect. 4.2). 2.4 Outline of the algorithm The first phase of the algorithm filters the possible bad alignments or ambiguities, by using the following criteria: • •
Pairs appearing less than Nmin times are discarded. Pairs of blocks with a swapping probability (Pswap) less than a threshold are also discarded. We define the swapping probability as the ratio between the number of times that two blocks are swapped and the total number of times that the same two blocks appear consecutively.
The second phase of the algorithm infers the generalization groups Ggn from the filtered LAB (LABfilt). These generalization groups can be seen as word classes, which go from 1 to N. •
Given the LABfilt, the generalization groups Ggn are constructed using the following loop: 1. Initialization: set n 1 and LAB0filt LABfilt . 2. Main part: while LABfilt0 is not empty do – Gn ¼ fðak ; bk Þg where ðak ; bk Þ is any element of LABfilt0 – Recursively, move elements ðai ; bi Þ from LABfilt0 to Gn if there is an element ðaj ; bj Þ 2 Gn such that ai ¼ aj or bi ¼ bj – Increase n (i.e. n n þ 1) 3. Ending: For each Gn, construct the two sets An and Bn which consists on the first and second element of the pairs in Gn, respectively. Then the Cartesian product of An and Bn is assigned to Ggn, i.e. Ggn An B n .
2.5 Recursive classification algorithm example •
Given a LABfilt, classify each Alignment Block into a group, using a cooccurrence criterion.
123
Recursive alignment block classification technique
1.
169
Start a loop. Take an unclassified Alignment Block from the LABfilt and create a group. abogado europeo abogado americano apoyar totalmente parlamento australiano parlamento europeo oponer totalmente traductor americano Group 1
2.
A1
B1
abogado
europeo
Add to the group all the Alignment Blocks which are partially equal (side A or B) with the ones already in the new group. abogado europeo abogado americano apoyar totalmente parlamento australiano parlamento europeo oponer totalmente traductor americano Group 1
3.
A1
B1
abogado
europeo
abogado
americano
parlamento
europeo
Reach the end of the LABfilt. End of the loop. (a) If one or more Alignment Blocks have been classified during the loop, go to step 2. abogado europeo abogado americano apoyar totalmente parlamento australiano parlamento europeo oponer totalmente traductor americano
123
M. R. Costa-jussa` et al.
170
Group 1 A1
B1
abogado
europeo
abogado
americano
parlamento
europeo
parlamento
australiano
traductor
americano
(b)
If no Alignment Block has been classified during the loop and there are still unclassified Alignment Blocks in the LABfilt, go to step 1.
Group 1
Group 2
A1
B1
A2
B2
abogado
europeo
apoyar
totalmente
abogado
americano
oponer
totalmente
parlamento
europeo
traductor
americano
parlamento
australiano
(c)
•
If no Alignment Block has been classified during the loop and there are no unclassified Alignment Blocks in the LABfilt, stop. Once all the Alignment Blocks in the LABfilt are classified, the final groups are: Group 1
•
Group 2
A1
B1
A2
B2
abogado
europeo
apoyar
totalmente
parlamento
americano
oponer
traductor
australiano
Inside the same group, we allow new internal combination in order to generalize the reordering to unseen sequences of words. These are the so-called Generalization groups. We can see the generalizations to unseen reorderings in Group 1: abogado australiano, parlamento americano, traductor australiano and traductor europeo.
2.6 Using extra information Additionally, the Alignment Block Classification can be used for extracting blocks from a lemmatized corpora. The resulting Ggn will be able to deal with grammar agreement between elements of each block, for instance the pair (conferencia, parlamentario), which does not have gender agreement, would be a correct
123
Recursive alignment block classification technique
171
generalization if we take each block as a lemma. Therefore the generalization would not be influenced by the particular distribution of word inflexions in training database. Furthermore, we can use a tagger to find out the grammatical function of each word. In case the blocks are constituted of only one word, i.e. the Alignment Blocks are pairs of swapping words, a general grammar rule to take into account for the Spanish to English translation is that in Spanish most adjectives are placed after the noun, whereas in English it is the opposite. However, there are exceptions to this rule and we can not rely completely on it, e.g. big man is translated as gran hombre when it refers to a man who has somehow succeded. The use of this morphological information is optional. The algorithm itself does not require extra information that is not employed in a standard SMT system. But it is interesting to benefit from morphological information, if available, as have been shown in other studies (Nießen and Ney 2001). In this study we will use lemmas and POS tags.
3 Baseline systems Two baseline systems are proposed to test our approach. The main difference between the two systems is the translation model, which constitutes the actual core of the translation systems. In both cases it is based on bilingual units. A bilingual unit consists of two monolingual fragments, where each one is assumed to be the translation of the other. 3.1 Ngram-based translation model The translation model can be thought of a language model of bilingual units (here called tuples). These tuples define a monotonic segmentation of the training sentence pairs (f1J ; eI1 ), into K units (t1 ; :::; tK ). The translation model is implemented using an Ngram language model, (for N = 3): pðe; f Þ ¼ Prðt1K Þ ¼
K Y
pðtk jtk2 ; tk1 Þ
ð1Þ
k¼1
Bilingual units (tuples) are extracted from any word alignment according to the following constraints: • • •
a monotonic segmentation of each bilingual sentence pairs is produced, no word inside a tuple is aligned to words outside the tuple, and no smaller tuples can be extracted without violating the previous constraints.
As a consequence of these constraints, only one segmentation is possible for a given sentence pair, which allows to build a bilingual language model. The bilingual language model is used as translation model. See (Marin˜o et al. 2006) for further details.
123
M. R. Costa-jussa` et al.
172
3.2 Phrase-based translation model The basic idea of Phrase-based translation is to segment the given source sentence into units (here called phrases), then translate each phrase and finally compose the target sentence from these phrase translations (Zens et al. 2004). Given a sentence pair and a corresponding word alignment, a phrase (or bilingual phrase) is any pair of m source words and n target words that satisfies two basic constraints: 1. 2.
Words are consecutive along both sides of the bilingual phrase, No word on either side of the phrase is aligned to a word out of the phrase.
We limit the maximum size of any given phrase to 7. The huge increase in computational and storage cost of including longer phrases does not provide a significant improvement in quality (Koehn et al. 2003) as the probability of reappearance of larger phrases decreases. Given the collected phrase pairs, we estimate the phrase translation probability distribution by relative frequency in both directions. Pðf jeÞ ¼
Nðf ; eÞ NðeÞ
Pðejf Þ ¼
Nðf ; eÞ Nðf Þ
where N(f, e) means the number of times the phrase f is translated by e.
3.3 Additional feature functions In each system, the translation model is combined in a log-linear framework with additional feature functions. •
•
• •
The target language model consists of an n-gram model, in which the probability of a translation hypothesis is approximated by the product of word n-gram probabilities. As default language model feature function, we use a standard word-based 5-gram language model generated with smoothing Kneser-Ney (Kneser and Ney 1995) and interpolation of higher and lower order n-grams (by using SRILM (Stolcke 2002)). The forward and backwards lexicon models provide lexicon translation probabilities for each phrase/tuple based on the word IBM model 1 probabilities. For computing the forward lexicon model, IBM model 1 probabilities from GIZA?? source-to-target alignments are used. In the case of the backwards lexicon model, target-to-source alignments are used instead. The word bonus model introduces a sentence length bonus in order to compensate the system preference for short output sentences. The phrase bonus model introduces a constant bonus per produced phrase and it is only used for the Phrase-based system.
123
Recursive alignment block classification technique
173
Fig. 2 System description
All these feature functions are combined in the decoder. The different weights are optimized on the development set applying the Simplex algorithm (Nelder and Mead 1965). 3.4 Enhancing an SMT system with the RABC technique We introduce the RABC technique in an SMT system as follows: 1.
2.
3.
Given the candidates to be swapped (Alignment Blocks proposed by the RABC technique), we reorder the source corpora, including training, development and test sets. Given the reordered source training corpus, we realign it with the original target training corpus. Notice that we compute the word alignment twice: once before extracting the LAB and once after the corpus is monotonized to build the SMT system. We build the phrase- and Ngram-based systems, respectively, using the new monotonized corpora. Notice that the decoding procedure has not to be changed for the new system (with reordering). Figure 2 describes the application of the RABC technique on an SMT system.
4 Evaluation framework 4.1 Corpus statistics Experiments have been carried out using the EPPS database (Spanish-English). The EPPS data set corresponds to the parliamentary session transcriptions of the European Parliament and is currently available at the Parliament’s website.1 The training corpus is about 1.3 million sentences. See Table 1 with the corpus
1
http://www.europarl.eu.int/.
123
M. R. Costa-jussa` et al.
174 Table 1 EuroParl Corpus: basic statistics for the considered training
The development data set and the Test data set have 2 references, (M and k stands for millions and thousands, respectively)
EPPS
Spanish
English
Training sentences
1.3 M
1.3 M
Words
36.6 M
35 M
Vocabulary
153.1 k
106.5 k
Lemma’s vocabulary
78.3 k
91 k
Development sentences
430
430
Words
15.3 k
16 k
Vocabulary
3.2 k
2.7 k
Lemma’s vocabulary
2.1 k
2.2 k
Test sentences
840
840
Words
22.7 k
20.3 k
Vocabulary
4k
4.3 k
Lemma’s vocabulary
2.6 k
3.3 k
statistics. More information can be found at the consortium website.2 In the case of the results presented here, we have used the version of the EPPS data that was made available by RWTH Aachen University through the TC-STAR consortium.3 The English POS-tagging has been carried out using freely available TNT tagger (Brants 2000) and lemmatization using wnmorph included in WordNet package (Miller et al. 1991). In the Spanish case, we have used the Freeling (Carreras et al. 2004) analysis tool which generates the POS-tagging and the lemma for each input word. 4.2 Experiments and results First, we study most common reordering patterns found in our task. We have a reference corpus which consists of 500 bilingual sentences manually aligned (Lambert 2008). Given the word alignment reference, we can extract the reordering patterns. Most common reordering patterns have been described as in de Gispert and Marin˜o (2003): ðx1 ; y1 Þðx2 ; y2 Þ:::ðxN ; yN Þ where each ðxi ; yi Þ describes a link between position xi and yi, in the original and the reordered source sentence composed of the source words appearing in the monotonization of the alignment. This means that the cross (0,1)(1,0) reflects: an bn to bn an ; where an (bn) is only one word. Table 2 presents the most frequent reordering patterns when aligning from Spanish to English with the EuroParl task. The most frequent pattern (0,1)(1,0) usually takes the form of the Noun?Adj = Adj?Noun, as in semana siguiente = following week. Less frequently we find the form Adv1?Adv2 = Adv2?Adv1, as in bastante bien = good enough. The second most frequent pattern (0,2)(1,0)(2,1) almost always takes the form of a noun followed by a prepositional clause in Spanish (Noun?Prep?Noun), and 2
http://www.tc-star.org/.
3
TC-STAR (Technology and Corpora for Speech to Speech Translation) is an European Community project funded by the Sixth Framework Programme.
123
Recursive alignment block classification technique Table 2 Reordering patterns for Es2En reference alignment of 500 sentences
175
Reordering pattern SPA->ENG
Counts
%
(0,1)(1,0)
392
38.4
(0,2)(1,0)(2,1)
113
11
(0,1)(1,2)(2,0)
112
11
(0,2)(1,3)(2,0)(3,1)
38
3.7
(0,3)(1,0)(2,1)(3,2)
37
3.7
(0,2)(1,1)(2,0)
25
2.4
(0,3)(1,4)(2,0)(3,1)(4,2)
16
1.6
Most frequent patterns
733
71.8
Adj?Noun in English, as in conferencia de premsa = press conference, or comparative adjectives as in discurso ma´s largo = longer speech (de Gispert and Marin˜o 2003). The third pattern (0,1)(1,2)(2,0) reveals the relationship Adv?Verb = Pronoun?Verb?Adv existing in cases as nunca podemos = we can never, a relationship impossible to detect when aligning the other way round, given the asymmetry of the alignment models (Brown et al. 1993). When using automatic alignment, even some of the second and third most frequent pattern crosses have a high probability of being wrongly detected, as teorı´a puedo = I can conceivably, where the word en in Spanish is ommitted in front of the cross. In this case, it seems advisable to use multi-words, which consists in linking expressions in a single token (e.g. por favor, de hecho or en teora) and combine information of the alignment in both directions. Several works in this direction can be found in (Lambert 2008). Experiments presented later in this section deal with the most frequent reordering pattern: (0,1)(1,0). In this work, no additional reorderings are considered, given that the other crossings have a much lower frequency as shown in Fig. 2 and state-ofthe-art automatic alignment usually fails to detect them. 4.2.1 LAB filtering parameters Nmin and Pswap The amount of admissible blocks in the LAB, is a function of the minimal number of block co-occurrence (Nmin) and the probability of them being swapped (Pswap)(see Sect. 2.4). We determine these parameters, from a subset of the corpus as follows. We remove the 500 manually aligned sentences from the training corpus. We train the Alignment Classification Block algorithm. The reference set is the sorce set modified. The modification consists in reordering the pair of words which swap in the order of the target language. Then, given a swapping of two words, it can be a Success (S) if the reference alignment is swapped (e.g. discurso interesante is swapped to interesante discurso and the reference is interesting speech), or a Failure (F) if the reference alignment is not swapped (e.g. gran discurso is swapped to discurso gran but the reference is notbig speech). Combining these two sources of information, we use the Simplex algorithm to minimise the following:
123
M. R. Costa-jussa` et al.
176
Q ¼ ðNS NF Þ We have chosen the cost function Q as a coherent criterion to optimize the number of successes (NS) and minimize the number of failures (NF). The cost function Q has as argument two quantified variables, and its’ output is a difference between two integers. Therefore the gradient based optimization techniques are not feasible. Note that the underlying problem is a multi-objective optimization, which we have transformed to a simple optimization problem by giving equal importance to the two objectives; i.e. Successes and -Failures. For this kind of problems, direct search techniques such as the Simplex algorithm are adequate. Figure 3 shows the relation between the two objectives, which gives a curve similar to the ROC curve used in detection theory. An increase in the success rate increases the failure rate in ROC, therefore there is a trade-off between two objectives. The solution that we have selected is the intersection of the diagonal with the curve, which corresponds to a trade-off that gives the same weight to both objectives. The maximum Q corresponds to the curve of lemmas reordering plus tags. Given the optimum values of Nmin and Pswap, we have also studied the number of good generalizations, i.e. the pairs of words which have been swapped correctly and were not seen swapped during training. Almost half of the Successes are generalizations. We see that the generalization groups Ggn (see Sect. 2.3) help more than deteriorates. The generalizations provide reorderings that would not be done neither by the Phrase-based system nor by the Ngram-based system, and it helps both the realignment and the translation. The word alignment gives priority to the monotonic order, therefore alignment links become more robust. In addition, the translation system can extract and use smaller units if the alignment is more monotonic. Some
350
lemmas lemmas+tags diagonal
300
Success
250 200 150 100 50 0 0
50
100
150
200
250
300
350
Fails Fig. 3 Relation between successes and fails (with different parameters -Nmin ; Pswap - for the reordering based on Alignment Block Classification: (1) using lemmas and (2) using lemmas plus tags), for the manually aligned reference corpus
123
Recursive alignment block classification technique
177
pairs in Ggn have no sense from the syntactical or semantical point of view; but as they do not appear in the corpus, they will not be used. In addition, some of the failures generate systematical errors that can be corrected either in the alignment phase or in the systems. Only a bad generalization appearing exclusively in the test set will not favour the translation quality. 4.2.2 Reordering experiments in the EuroParl Es2En task Given Nmin and Pswap (5 and 0.33, respectively) which gave the smallest Q, we built the Alignment Block Classes. Before applying the algorithm, we added morphological information: lemmas and tags. We added the two informations sequentially: firstly, we used the lemma alignment to build the LAB and secondly, we removed from the list those pairs of blocks which were not constituted by noun plus adjective. Afterwards we built the Alignment Block Classes (also called RABC) and finished the reordering process. In order to evaluate the alignment, after the second step, we undid the reorder without changing the alignment links obtained. Table 3 shows the improvement in AER. Table 4 shows the improvement in both measures mWER and BLEU. Three systems are compared: the baseline system without any reordering; the baseline system swapping only the Noun?Adj (which is an standard Spanish-English reordering linguistic rule); the baseline system using the RABC technique. The last one shows to outperform in WER and BLEU the first two. Although most reorderings are due to the Noun?Adj reordering, the RABC is more accurate than the simple rule of swapping them because it is able to statistically learn the pair of Noun?Adj which are reordered, discriminate those that are not reordered and also add new reorderings of different words (i.e. Adv?Adj). If we compare the performance of the RABC algorithm in both SMT systems, the quality of translation is improved in both cases. However, in the Phrase-based system the RAC seems to perform slightly better. Analysing the errors, we see that Table 3 Results in the EuroParl task in the direction Es2En in the AER evaluation Experiment
Ps
Rs
Fs
Pp
Rp
Fp
AER
Alignment
78.02
74.04
75.98
84.96
56.41
67.80
20.64
Alignment ? RABC
79.73
74.98
77.28
86.66
57.00
68.77
19.36
Table 4 Results in the EuroParl Es2En task for the Phrase-based system and the Ngram-based system System
Configuration
mWER
BLEU
PB
Baseline
34.44
55.23
PB
Baseline ? N?Adj_swap
34.27
55.43
PB
Baseline ? RABC
33.75
56.32
NB
Baseline
34.46
55.24
NB
Baseline ? N?Adj_swap
34.30
55.42
NB
Baseline ? RABC
33.68
56.26
123
M. R. Costa-jussa` et al.
178
Fig. 4 Translation examples of the baseline ? N?Adj_swap and baseline?RABC systems. Best reorderings are in bold
in some cases the Ngram-based system has already performed a solution for the type of ‘‘unknown’’ words. The words which appear ‘‘embedded’’ inside a tuple are solved in (Marin˜o et al. 2006). The results show that reordering of words can be learnt using a word classification based on the alignment. See some comparison examples between the baseline reordering and the proposed method in Fig. 4. 5 Conclusions In this paper, we have introduced a local reordering approach based on Alignment Block Classes which benefits alignment and translation quality. When dealing with local reorderings, we can infer better reorderings than the ones provided only by the translation units (both phrases or tuples). In the EuroParl task, the alignment block classes (blocks of length 1) have been shown useful both in alignment and in translation. Actually, results show a better performance in several parts of the translation process. 1. 2. 3.
4.
The proposed reordering improves the alignment itself because it monotonizes the bilingual text, so the word alignment becomes more robust. The Alignment Block Classification infers better local reorderings than the ones provided only by the translation units (both phrases or tuples). The Alignment Block Classification infers better local reorderings than the ones provided by a Spanish-English well-known standard linguistic rule (Noun?Adj is swapped). Both measures, mWER and BLEU, improve almost over 1 point (in percentage).
Further improvements could be expected when dealing with longer blocks and experimenting with less monotonic pairs of languages. We leave as further research the following points: 1.
Evaluate the system with tasks with greater reordering difficulties such as the pair German/English and do experiments with longer blocks.
123
Recursive alignment block classification technique
2.
179
Deal with the inference of rules based on a wider context, in case morphological information is available.
Acknowledgments This work has been partially funded by the Spanish Department of Science and Innovation through the Juan de la Cierva fellowship program and the BUCEADOR project (TEC200914094-C04-01). The authors also want to thank the anonymous reviewers of this paper for their valuable comments. Finally, the authors want to thank Barcelona Media Innovation Center, Universitat Polite`cnica de Catalunya and TALP Research Center for their support and permission to publish this research.
References Brants, T. (2000) Tnt–a statistical part-of-speech tagger. In Proceedings of the sixth applied natural language processing. Brown, P., Della Pietra, S., Della Pietra, V., & Mercer, R. (1993). The mathematics of statistical machine translation. Computational Linguistics, 19(2), 263–311. Carreras, X., Chao, I., Padro´, L., & Padro´, M. (2004) Freeling: An open-source suite of language analyzers. In 4th international conference on language resources and evaluation, LREC’06, Lisboa, Portugal. Costa-jussa`, M. R., & Fonollosa, J. A. R. (2009). State-of-the-art word reordering approaches in statistical machine translation. IEICE Transactions on Information and Systems, 92(11), 2179–2185. Costa-jussa`, M. R., Fonollosa, J. A. R., & Monte, E. (2008). Using reordering in statistical machine translation based on alignment block classification. In 6th international conference on language resources and evaluation, LREC’08. de Gispert, A., Marin˜o, J. (2003). Experiments in word-ordering and morphological preprocessing for transducer-based statistical machine translation. In IEEE automatic speech recognition and understanding workhsop, ASRU’03 (pp. 634–639). St. Thomas, USA. Kanthak, S., Vilar, D., Matusov, E., Zens, R., & Ney, H. (2005). Novel reordering approaches in phrasebased statistical machine translation. In Proceedings of the ACL workshop on building and using parallel texts: Data-driven machine translation and beyond (pp. 167–174). Ann Arbor, MI. Kneser, R., & Ney, H. (1995) Improved backing-off for ngram language modeling. IEEE International Conference on ASSP, 2, 181–184. Koehn, P., Och, F. J., & Marcu, D. (2003). Statistical phrase-based translation. In Proceedings of the human language technology conference, HLT-NAACL’2003 (pp. 48–54). Edmonton, Canada. Lambert, P. (2008). Exploiting lexical information and discriminative alignment training in statistical machine translation. Ph.D. thesis, Software Department, Universitat Polite`cnica de Catalunya (UPC). Marin˜o, J. B., Banchs, R. E., Crego, J. M., de Gispert, A., Lambert, P., Fonollosa, J. A. R., & Costa-jussa`, M. R. (2006) N-gram based machine translation. Computational Linguistics, 32(4), 527–549. Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K., & Tengi, R. (1991). Five papers on word net. Special Issue of International Journal of Lexicography, 3(4), 235–312. Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. The Computer Journal, 7, 308–313. Nießen, S., & Ney, H. (2001). Morpho-syntactic analysis for reordering in statistical machine translation. In Proceedings of the MT-Summit VII (pp. 247–252). Och, F. J., & Ney, H. (2003). A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1), 19–51. Popovic, M., & Ney, H. (2006). Pos-based word reorderings for statistical machine translation. In 5th international conference on language resources and evaluation (LREC) (pp. 1278–1283). Genoa. Stolcke, A. (2002). Srilm–an extensible language modeling toolkit. In Proceedings of the 7th international conference on spoken language processing, ICSLP’02 (pp. 901–904). Denver, USA. Tillmann, C., & Zhang, T. (2005). A localized prediction model for statistical machine translation. In ACL. Zens, R., Och, F. J., & Ney, H. (2004) Improvements in phrase-based statistical machine translation. In Proceedings of the human language technology conference, HLT-NAACL’2004 (pp. 257–264). Boston, MA (USA).
123