) and “sentence” () were included as units of text logical layout. Before alignment, each text was transformed to the TEI-conformant format4. The XAlign system5 was used for the alignment process. Starting from the French version, the goal of the alignment was to establish 1:1 relations on the segment level ( tag) with all other languages. In order to achieve this goal segments had to be further divided. So, the total number of segments in all texts is 4409, and the average number of words per language is about 60,000.
4 http://www.tei-c.org/index.xml 5 http://led.loria.fr/download/source/Xalign.zip
SEE-ERA.NET Pilot Joint Call
Project ID: 06-1000031-10503
As this corpus is very small for MT studies (it is about 25 times smaller than SEnAC) it requires a significant extension6. This type of text alignment of bitexts required an intensive manual control of the output of the XAlign system. In this way, the missing segments or the inconsistencies between the source text and its translations were also identified. Vous savez que cette formalité du visa est inutile, et que nous n'exigeons plus la présentation du passeport? Vi znate da je ova formalnost viziranja izlišna i da se više ne traži pokazivanje isprava? Знаете ли, че тази формалност с паспортите е безполезна и че ние вече не изискваме да представяте паспортите си? You know that a visa is useless, and that no passport is required? Ξέρετε ότι αυτή η τυπική διαδικασία της βίζας δεν είναι αναγκαία και δεν απαιτείται πλέον η εµφάνιση του διαβατηρίου; Ali vam je znano, da je ta formalnost vidiranja nepotrebna in da ne zahtevamo več predložitve potnega lista ? Ştiţi cã formalitatea vizei e inutilă şi că noi nu mai cerem prezentarea paşaportului.
Figure 2: A translation unit from the 7-language SEnLC parallel corpus
3.
Sub-sentential annotation of the multilingual data
Each project partner took care about the tokenization, morpho-syntactic tagging and lemmatisation of the texts in their own languages, using in-house or public-domain processing tools (adapted for the new languages). For instance, Romanian, English and French texts were processed based on the RACAI tools [8] integrated into the linguistic web-service platform available at http://nlp.racai.ro/ webservices/. The German data was processed using Helmut Schmid's TreeTagger (this tagger has been successfully used for German, English, French, Italian, Dutch, Spanish, Bulgarian, Russian, Greek, Portuguese, Chinese and old French texts and it is available at http://www.ims.unistuttgart.de/projekte/corplex/TreeTagger/). The pre-processing of Czech part of the corpus was kindly provided by Aleš Horák from Faculty of Informatics at Masaryk University. After tokenization, tagging and lemmatization, this annotation was added to the XML encoding of the parallel corpus. Depending on the available processing tools for different languages, additional information could be added to each language-specific segment of a translation unit. Figure 3 shows the representation of the Romanian segment of the translation unit displayed in Figure 1. ...
6 The "1984" corpus, the encoding of which is available in exactly the same format, is available at the address: http://nl.ijs.si/ME. Although
the average number of tokens in each language of the "1984" corpus is 110,000, the joined SEnLC and "1984" still makes a too small parallel corpus for MT experiments.
SEE-ERA.NET Pilot Joint Call
Project ID: 06-1000031-10503
Informaţiile culese conform art. 5 ,6 şi 7 trebuie să fie luate în considerare în cadrul procedurii de autorizare . ...
Figure 3: Linguistically analysed sentence (Romanian) of a translation unit of the SEnAC parallel corpus The tagsets used for all languages (except Bulgarian and German) were compliant with the MULTEXT specifications, for the most part with the MULTEXT-East specifications Version 3 [9] (for the details of the morpho-syntactic annotation, see http://nl.ijs.si/ME/V3/msd/). The Table 1 shows some statistics concerning the result of the pre-processed corpus: Language
No. of tokens
Avg no. of tokens/sentence
BG
1436925
23.79
CS
1238981
20.51
DE
1314441
21.76
EL
1469642
24.33
EN
1466912
24.29
FR
1527241
25.29
RO
1422995
23.56
SL
1271011
21.04
Table 1: Statistical data on the SEnAC parallel corpus The corpus SEnLC was tokenized, lemmatized and tagged according to the same principles as used in the
SEnAC encoding. The total number of tokens in this text in French is 71,793, while the total number of unique tokens (types) is 9,433 (ratio 7.6). The figures for other languages are different, e.g. for Serbian the total number of tokens is 58,722, while the total number of types is 12,733 (ratio 4.6), for Bulgarian the total number of tokens is 58,678, while the total number of types is 11,217 (ratio 5.2), while for Greek the total number tokens is 68,615, and the total number of types is 11,809 (ratio 5.8). Figure 4 shows the representation of the Serbian segment of the translation unit displayed in Figure 2: Vi znate da je ova formalnost
SEE-ERA.NET Pilot Joint Call
Project ID: 06-1000031-10503
viziranja izlišna i da se više ne traži pokazivanje isprava ?
Figure 4: Linguistically analysed sentence (Serbian) of a translation unit of the SEnLC parallel corpus
4.
Word Alignment of SEnAC
Based on the monolingual data from the SEnAC we built language models for each language. For Romanian we used the TTL tagging modeller [11] while for the other languages we used the METT tagging modeller [12]. Both systems for language modelling and tagging are able to perform tiered tagging [13], a morpho-syntactic disambiguation method that was specially designed to work with large (lexicon) tagsets. In order to build the translation models from the linguistically analysed parallel corpus we used GIZA++ [3] and constructed 8 unidirectional translation models (EN-RO, RO-EN, EN-BG, BG-EN, ENSL, SL-EN, EN-GR, GR-EN). The processing unit considered in each language was not the wordform but the string formed by its lemma and the first two characters of the associated morphosyntactic tag (e.g. for the wordform "informaţiile" we took the item "informaţie/Nc"). We used for each language 20 iterations (5 for Model 1, 5 for HMM, 1 for THTo3, 4 for Model3, 1 for T2To4 and 4 for Model4). We did not include Model 5 nor Model 6 as we noticed a degradation of the perplexities. Given the formulaic language used by the Acquis-Communautaire documents, the perplexities of the resulting translation models were encouraging, and range from 13.07 (RO-EN) to 19.88 (EN-BG). Based on these models we word-aligned the bitexts using the iterative high precision COWAL reified aligner [14]. The reified alignment method allows one to combine different information sources to guide the identification of the highest probability translation pairs in a bitext. In our alignment system we used among others translation probabilities, string similarity scores, translation entropy, word positions, etc. Eight such information sources were combined by means of a linear interpolation formula into a scoring function used to build the most probable lexical alignment. As described in [14], the translation pairs prescribed by each unidirectional translation model were unconditionally included in the alignment skeleton. The rest of the links were established in the subsequent iterations of the aligner. The training corpora SEnAC, the alignments and the perplexities for each translation model are available at http://www.racai.ro/ResearchActivity/WebServicesandResources/SEEERANETResources/tabid/131/D efault.aspx. An alignment viewer and editor (see Appendices) is also available for visualization and correction of the alignments with the purpose of further fine-tuning the translation models. Another useful tool available at the mentioned address, implemented by the Slovene partner, builds N way alignments from existing pair wise alignments, either sentence or word. More specifically, when given n pair wise alignments, such that one of the corpora in the alignment is the same for all of the alignments, and the other is different for all of the alignments, it produces n+1 way alignment. The corpus which is included in all of the alignments is called the hub corpus. The time complexity of the algorithm is O(S*A*C), where S is the number of sentences in the hub corpus, A is the number of input alignments used and C is the average number of corpora per alignment. In the standard scenario of using only pair wise alignments C = 2. The space complexity of the data structure for the mapping SL
SEE-ERA.NET Pilot Joint Call
Project ID: 06-1000031-10503
is O(C*S), where C is the total number of corpora in the result alignment and S is the average number of sentences per corpus. This method, which exploits the translation equivalence transitivity, has many advantages versus the direct X1-X2 alignment: it allows a multilingual team to share the work so that different partners deal only with known pairs of languages (in our case EN-Xi); having a good command of a given language pair they can check and correct the alignments of a bitext extracted from a multilingual parallel text; the derived alignments Xi-Xj are usually of comparable accuracy as the alignment of PIVOT-Xi and PIVOT-Xj the derived alignment is much faster than the direct alignment the derived alignment is much cheaper in terms of human expertise, language resources, and computing power
5.
Using MOSES Toolkit to Perform Machine Translation
The MOSES toolkit [5] is a public domain environment, which was developed in the ongoing European project EUROMATRIX, and allows for rapid prototyping of Statistical Machine Translation systems. It assists the developer in constructing the language and translation models for the languages he/she is concerned with and by its advanced factored decoder and control system ensures the solving of the fundamental equation of the Statistical Machine Translation in a noise-channel model: Target* = argmaxTarget P(Source)*P(Target) What is extremely useful is that the environment allows a developer to provide MOSES with language and translation models externally developed, offering means to ensure the conversion of the necessary data structures into the expected format and further improve them. Once the statistical models are in the prescribed format, the MT system developer may define his/her own factoring strategy. If the information is provided, MOSES can build various factored representations for each of the lexical items (be they word or phrases) to be used in deriving the best translation: occurrence form, lemmatized form, associated part-of-speech or morpho-syntactic tag. By dissociating the treatment of occurrence form of lexical items into distinct processes (translating lemma, translating morpho-syntactic properties of the current item and respectively generating the target inflected lexical item based on the translated lemma and the translated morphological input information) the system achieves a higher flexibility, a better generalization of the linguistic facts, more reliable decisions in the quest for the optimal translation solution. Moreover, the system allows for integration of higher order information (shallow or even deep parsing information) in order to improve the output lexical items reordering. For further details on the Moses Toolkit for Statistical Machine Translation and its tuning, the reader is directed to the EUROMATRIX project web-page http://www.euromatrix.net/ and to the download web-page http://www.statmt.org/moses/.
6.
Conclusions
The described language resources as well as the MOSES toolkit were used at RACAI to perform a series of very encouraging MT experiments for part of the language pairs of the project, using always English as one of the source or target language (see the Appendices). Due to the limited period of the project and of the human resources, as well as due to insufficient training data for Serbian, not all the language pairs of the project have been experimented. However, even the few experiments that were made proved that, whenever adequate language resources are available, the development of reasonably accurate SMT systems for the concerned languages is realistic and feasible within acceptable time limits. The extremely valuable language resources that were created during this project (they were carefully analysed and manually corrected where necessary) as well the public available tools offer to all the interested parties the possibility to further experiment with new language
SEE-ERA.NET Pilot Joint Call
Project ID: 06-1000031-10503
pairs. All the partners in this project are working on research proposals or are already involved in ongoing projects that will definitely build on the results of this SEEERAnet project. Future work could address the more challenging task on extending the SEnLC literary corpus and building translation models and, provided adequate data will be available, experiments with other South East and Balkan languages.
7. [1]
References Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., Mercer, R. J. (2003). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), pp. 263–311.
[2]
Koehn P., and Hoang, H. (2007). Factored Translation Models. Conference on Empirical Methods in Natural Language Processing, pp. 868-876.
[3]
Och, F. J., Ney H. (2000). Improved Statistical Alignment Models. In Proceedings of the 38th Conference of ACL, Hong Kong, pp. 440-447
[4]
Och, F. J., Ney H. (2003). A Systematic Comparison of Various Statistical Alignment Models, Computational Linguistics, 29(1), pp. 19-51.
[5]
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Wade, S., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E. (2007). Moses: Open Source Toolkit for Statistical Machine Translation. Annual Meeting of the Association for Computational Linguistics (ACL), demonstration session, Prague, Czech Republic.
[6]
Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufiş, D. (2006). The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of the 5th LREC Conference, Genoa, Italy, 22-28 May, pp. 2142-2147
[7]
Ceauşu, A., Ştefănescu, D., Tufiş, D. (2006). Acquis Communautaire sentence alignment using Support
[8]
Tufiş, D., Ion, R., Ceauşu, A., Ştefănescu, D. (2008). RACAI's Linguistic Web Services. In Proceedings
Vector Machines. In Proceedings of the 5th LREC Conference, Genoa, Italy. of the 6th Language Resources and Evaluation Conference - LREC 2008, Marrakech, Morocco. ELRA European Language Ressources Association. ISBN 2-9517408-4-0. [9]
Erjavec, T. (2004). MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora. In: Proc. of the Fourth Intl. Conf. on Language Resources and Evaluation, LREC'04, ELRA, Paris, pp. 1535 - 1538.
[10]
Krstev, C., Vitas, D., Erjavec, T. (2004). Morpho-Syntactic Descriptions in MULTEXT-East - the case of Serbian, in Informatica, No. 28, The Slovene Society Informatika, Ljubljana, pp. 431-436
[11]
Ion, R. (2007). Word Sense Disambiguation Methods Applied to English and Romanian, PhD thesis (in Romanian), Romanian Academy, Bucharest, 138 p.
[12]
Alexandru Ceausu (2006). Maximum Entropy Tiered Tagging. In Janneke Huitink & Sophia Katrenko (editors), Proceedings of the Eleventh ESSLLI Student Session, pp. 173-179
[13]
Tufiş, D.(1999). Tiered Tagging and Combined Language Models Classifiers. In Václav Matousek, Pavel Mautner, Jana Ocelíková, and Petr Sojka, editors, Text, Speech and Dialogue (TSD 1999), Lecture Notes in Artificial Intelligence 1692, Springer Berlin / Heidelberg,. ISBN 978-3-540-66494-9, pp. 28-33.
[14]
Tufiş, D., Ion, R., Ceauşu, A., Ştefănescu, D. (2006). Improved Lexical Alignment by Combining Multiple Reified Alignments. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, 3-7 April, pp. 153-160
[15]
Tufiş, D., Koeva S. (2007). Ontology-supported Text Classification based on Cross-lingual Word Sense Disambiguation. In Francesco Masulli, Sushmita Mitra and Gabriella Pasi (eds.). Applications of Fuzzy Sets Theory. 7th International Workshop on Fuzzy Logic and Applications, WILF, Camogli, Italy, LNAI 4578, Springer-Verlag Berlin Heidelberg, pp. 447-455
SEE-ERA.NET Pilot Joint Call
Project ID: 06-1000031-10503
8.
Appendices
Snapshot of the alignment visualization of a translation unit (no. 69) in four bitexts
9.
Examples of Machine Translation Experiments
In this section we present translation examples obtained by using the language and translation models built during the project which were fed into the MOSES language independent translation engine. A number of 500 translation units of the SEnAC were left out from the training process which created the language models and the translation models. They were used for testing purposes. Below, there are original sentences of one translation unit out of the 500 test units. These original (human) translations are the references for the machine translations (in both directions) for the EN-RO, EN-SL and EN-EL pairs of languages. The tables that follows display the translations produced by our MT prototypes and by Google's MT systems. We lowered the case in both reference and automatic translations. English Original after consulting the council , the commission shall fix the limits within which the agency may contract loans for a term not exceeding two years . for loans for a term exceeding two years the agency must obtain through the commission the approval of the council acting by a qualified majority in each individual case . the agency may raise loans on the financial markets in a member state subject to the
SEE-ERA.NET Pilot Joint Call
Project ID: 06-1000031-10503
provisions of law applying to internal loans or , if such provisions do not exist in a member state , after agreement has been reached between such member state and the agency in regard to the proposed loan . Romanian Original după consultarea consiliului , comisia stabileşte limitele în cadrul cărora agenţia poate contracta împrumuturi pe un termen de maxim doi ani . pentru împrumuturi pe termen mai mare de doi ani , agenţia trebuie să obţină , prin intermediul comisiei , aprobarea consiliului , care hotărăşte cu majoritate calificată în fiecare caz în parte . agenţia poate contracta împrumuturi pe pieţele financiare ale unui stat membru , sub rezerva dispoziţiilor legale aplicabile împrumuturilor interne , sau , dacă aceste dispoziţii nu există într-un stat membru , în momentul în care statul membru şi agenţia ajung la un acord cu privire la împrumutul propus . Slovene Original komisija po posvetovanju s svetom določi meje , v katerih se agencija lahko dogovarja o posojilih z rokom vračila , ki ne presegajo dveh let . za posojila z rokom vračila , ki presega dve leti , mora agencija prek komisije pridobiti odobritev sveta , ki o vsakem primeru odloča s kvalificirano večino . posojila na finančnih trgih države članice lahko agencija najema po pravnih predpisih , ki veljajo za interna posojila , ali če v državi članici takšnih predpisov ni , po sporazumu , ki sta ga o predlaganem posojilu dosegli država članica in agencija . Greek Original η επιτροπή , µετά από γνώµη του συµβουλίου , ορίζει τα όρια εντός των οποίων ο οργανισµός δύναται να συνάπτει δάνεια , των οποίων η διάρκεια δεν υπερβαίνει τα δύο έτη . για τα δάνεια που υπερβαίνουν τα δύο έτη , ο οργανισµός πρέπει να ζητεί δια της επιτροπής την έγκριση του συµβουλίου , που αποφασίζει µε ειδική πλειοψηφία χωριστά για κάθε περίπτωση . ο οργανισµός δύναται να συνάπτει δάνεια στις κεφαλαιαγορές κράτους µέλους , εντός του πλαισίου των νοµοθετικών διατάξεων που ισχύουν για δανεισµούς στο εσωτερικό ή , εφόσον δεν υφίστανται τοιούτου είδους διατάξεις στο κράτος µέλος , εφ ' όσον το κράτος µέλος αυτό και ο οργανισµός συνενοηθούν και συµφωνήσουν ως προς το δάνειο που προτίθεται να συνάψει ο οργανισµός .
Translation from English into Romanian Our translation
Google translation
după consultarea consiliului , comisia stabileşte limitele în care agenţia poate contracta împrumuturi pe o durată de maximum doi ani . pentru împrumuturi pentru un termen de maximum doi ani , agenţia trebuie să obţină prin comisia de aprobare a consiliului care hotărăşte cu majoritate calificată în fiecare caz în parte . agenţia poate contracta împrumuturi pe pieţele financiare într- un stat membru , sub rezerva dispoziţiilor legale aplicabile împrumuturilor interne sau , dacă aceste dispoziţii nu există într- un stat membru , după acord a fost atinsă între aceste state membre şi agenţia în materie la propuse împrumuturi .
după consultarea consiliului, comisiei, stabileşte limitele în care agenţia poate contracta împrumuturi pe un termen de cel mult doi ani. pentru împrumuturi pe un termen mai mare de doi ani, agenţia trebuie să obţină prin intermediul comisiei de aprobarea consiliului, hotărând cu majoritate calificată în fiecare caz individual. agenţia poate ridica de împrumuturi de pe pieţele financiare într-un stat membru, sub rezerva dispoziţiilor de aplicare a legii pentru a împrumuturilor interne, sau, dacă astfel de dispoziţii nu există într-un stat membru, după ce sa ajuns la un acord între astfel de stat membru şi în ceea ce priveşte agenţia propuse pentru a împrumutului.
SEE-ERA.NET Pilot Joint Call
Project ID: 06-1000031-10503
Translation from English into Slovene Our translation
Google translation
po posvetovanju s svet , komisija določi omejitve , v katerih se lahko agencija pogodbe za posojila , trajanje varstva , ki ne presega dveh let . za posojila za trajanje varstva , ki presega dveh let agencija mora pridobiti z komisije o odobritvi svetu predlog komisije s kvalificirano večino v vsakem posameznem primeru . agencija lahko dvigne posojila na finančnih trgih v državi članici , za katere veljajo določbe zakonodaje , ki veljajo za interna posojila , ali če te določbe ne obstajajo v državi članici , po sporazumu dosežen med temi države članice in agencija v zvezi s predlaganimi posojila .
po posvetovanju s svetom komisija določi omejitve, znotraj katerih lahko agencija najemanje posojil za dobo največ dveh let. za posojila za obdobje več kot dve leti, mora agencija prek komisije pridobiti odobritev sveta, ki odloča s kvalificirano večino v vsakem posameznem primeru. lahko agencija najemati posojil na finančnih trgih v državi članici v skladu z določbami notranjega prava, ki veljajo za posojila ali, če takšni predpisi ne obstajajo v državi članici, potem ko je bil dosežen sporazum med takšne države članice in agenciji v zvezi s k predlagani posojila.
Translation from English into Greek Our translation
Google translation
µετά τη γνώµη του συµβουλίου , η επιτροπή καθορίζει τα όρια εντός των οποίων ο οργανισµός δύναται να συνάπτει δάνεια για µια ένδειξη δεν υπερβαίνει τα δύο έτη . για δάνεια για µια ένδειξη υπερβαίνει τα δύο έτη , ο οργανισµός πρέπει να προέρχεται από την επιτροπή , την έγκριση του συµβουλίου που αποφασίζει µε ειδική πλειοψηφία σε κάθε µεµονωµένη περίπτωση . ο οργανισµός δύναται να συνάπτει δάνεια στις χρηµατοπιστωτικές αγορές σε ένα κράτος µέλος µε την επιφύλαξη των διατάξεων του δικαίου εφαρµόζεται στα εσωτερικά δάνεια ή , εάν οι διατάξεις δεν υφίστανται σε ένα κράτος µέλος , µετά από συµφωνία έχει συµπληρώσει µεταξύ αυτών των κρατών µελών και τον οργανισµό παρέµβασης σε µέριµνα για τα προτεινόµενα δάνεια .
µετά από διαβούλευση µε το συµβούλιο, η επιτροπή καθορίζει τα όρια εντός των οποίων ο οργανισµός δύναται να συνάπτει δάνεια για διάστηµα που δεν υπερβαίνει τα δύο έτη. για τα δάνεια για διάστηµα µεγαλύτερο των δύο ετών ο οργανισµός πρέπει να λάβει η επιτροπή µε την έγκριση του συµβουλίου µε ειδική πλειοψηφία, σε κάθε συγκεκριµένη περίπτωση. ο οργανισµός δύναται να συνάπτει δάνεια στις χρηµατοοικονοµικές αγορές σε ένα κράτος µέλος υπό την επιφύλαξη των διατάξεων του δικαίου που εφαρµόζεται στα εσωτερικά δάνεια ή, αν τέτοιες διατάξεις δεν υπάρχουν σε ένα κράτος µέλος, αφού έχει επιτευχθεί συµφωνία µεταξύ των εν λόγω κράτος µέλος και σε ό, τι αφορά τον οργανισµό µε την προτεινόµενη δανείου.
Translation from Romanian into English Our translation
Google translation
after consulting the council , the commission shall determine the limits within which the agency may raise loans on a period not exceeding two years . for the loan on the time more than two years , the agency must obtain , by means of the commission , the approval of the council , acting by a qualified majority in each individual case . the agency may raise loans on financial markets of a member state , subject to the legal provisions applicable to the internal loans or , if these provisions shall not exist in a member state , within the time in which the member states and the agency reach an agreement on the proposed loan .
after consulting the committee establishes limits within which the agency may contract loans on a maximum period of two years. for term loans of more than two years, the agency must get through committee, council approval, acting by a qualified majority in each case. agency may contract loans in the financial markets of a member state, subject to the laws applicable to domestic loans, or if there are no such provisions in a member state, when the member state and agency reach an agreement regarding the proposed loan .
SEE-ERA.NET Pilot Joint Call
Project ID: 06-1000031-10503
Translation from Slovene into English Our translation
Google translation
the commission after consultation with the council of the ceilings fixed in which the agency may contract loans with the deadline return which does not exceed two years . for the loan with the deadline return exceeding two years , the agency through the commission to obtain the approval of the council , acting on each case shall act by a qualified majority . loans on financial markets of the member state of the agency may raise the legal provisions applicable to internal loans , or if the member state of such provisions shall , under the agreements which are by the proposed loan achieved by the member states and the agency .
the commission, after consulting with the world determine the extent to which the agency can arrange loans with repayment date, not exceeding two years. for loans with repayment date, in excess of two years, the agency must obtain approval through the council, which on each case by a qualified majority. loans in the financial markets, member states, the agency may hire legal rules that apply to internal loans, or if the member state of such rules, under the agreement, which was reached on the proposed loan member state and the agency.
Translation from Greek into English Our translation
Google translation
the commission , after opinion of the council shall designate the limits within which the agency may contract loans , of which the period not exceeding two years . for the loans exceeding two years , the agency shall be requested by the commission of the approval of the council , acting by a qualified majority separately for each case . the agency may contract loans on the capital markets of a member state , within the framework of the legislative provisions in force for lending in the territory or , if there is no such species provisions in the member state , since of that member state and the agency συνενοηθούν and agreed as to the loans which intend to conclude the agency .
the committee, after obtaining the opinion of the board, defines the limits within which the agency may borrow, whose term not exceeding two years. for loans longer than two years, the agency must request by the approval of the committee of the board, acting by a qualified majority separately for each case. the agency may borrow on capital markets member state, within the framework of the laws that apply to loans within or, if there are no provisions toioutoy kind in the member state where the member state and the agency synenoithoun and agree on the loan which intends to conclude the agency.
9.1. Evaluations In the following we are providing NIST and BLEU evaluations using the last version of the NIST official mteval tool (ftp://jaguar.ncsl.nist.gov/mt/resources/mteval-v12.pl). Overall Statistics
Language pair English to Greek English to Slovene English to Romanian Greek to English Slovene to English Romanian to English
SEE-ERA.NET Pilot Joint Call
Google translation NIST score BLEU score 3.5705 3.5340 4.4057 3.5427 4.0424 4.3573
0.2934 0.2653 0.4508 0.2868 0.2215 0.2827
RACAI translation NIST score BLEU score 3.9730 3.6719 4.9923 3.8036 4.0589 4.6191
0.3533 0.2450 0.5634 0.3005 0.2293 0.4862
Project ID: 06-1000031-10503
MT evaluation scorer began on 2009 Jan 13 at 21:43:03 command line: ./mteval-v12.pl -r ref_gr.xml -s src_en.xml -t test_en_gr.xml Evaluation of en-to-gr translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 3.5705
BLEU score = 0.2934 for system "google_gr_constrained_primary"
# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 3.2557 0.2106 0.0778 "google_gr_constrained_primary" BLEU: 0.5826 0.3333 0.2566 "google_gr_constrained_primary"
4-gram -----0.0175
5-gram -----0.0088
6-gram -----0.0089
7-gram -----0.0090
8-gram -----0.0000
9-gram -----0.0000
0.1964
0.1532
0.1273
0.1009
0.0741
0.0467
# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 3.2557 3.4663 3.5442 3.5617 3.5705 3.5795 3.5885 3.5885 "google_gr_constrained_primary" BLEU: 0.5435 0.4111 0.3433 0.2934 0.2541 0.2238 "google_gr_constrained_primary" MT evaluation scorer ended on 2009 Jan 13 at 21:43:04 SEE-ERA.NET Pilot Joint Call
0.1978
Project ID: 06-1000031-10503
0.1734
9-gram -----3.5885
0.1487
MT evaluation scorer began on 2009 Jan 13 at 21:48:07 command line: ./mteval-v12.pl -r ref_sl.xml -s src_en.xml -t test_en_sl.xml Evaluation of en-to-sl translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 3.5340
BLEU score = 0.2653 for system "google_sl_constrained_primary"
# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 3.3472 0.1626 0.0242 "google_sl_constrained_primary" BLEU: 0.5816 0.2990 0.2083 "google_sl_constrained_primary"
4-gram -----0.0000
5-gram -----0.0000
6-gram -----0.0000
7-gram -----0.0000
8-gram -----0.0000
9-gram -----0.0000
0.1368
0.0957
0.0753
0.0652
0.0549
0.0444
# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 3.3472 3.5098 3.5340 3.5340 3.5340 3.5340 3.5340 3.5340 "google_sl_constrained_primary" BLEU: 0.5816 0.4170 0.3309 0.2653 0.2164 0.1815 "google_sl_constrained_primary" MT evaluation scorer ended on 2009 Jan 13 at 21:48:07
SEE-ERA.NET Pilot Joint Call
0.1568
Project ID: 06-1000031-10503
0.1375
9-gram -----3.5340
0.1213
MT evaluation scorer began on 2009 Jan 15 at 14:31:43 command line: ./mteval-v12.pl -r ref_ro.xml -s src_en.xml -t test_en_ro.xml Evaluation of en-to-ro translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 4.4057
BLEU score = 0.4508 for system "google_ro_constrained_primary"
# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 3.9796 0.3605 0.0395 "google_ro_constrained_primary" BLEU: 0.6949 0.5385 0.3966 "google_ro_constrained_primary"
4-gram -----0.0174
5-gram -----0.0088
6-gram -----0.0088
7-gram -----0.0000
8-gram -----0.0000
9-gram -----0.0000
0.2783
0.2105
0.1504
0.0893
0.0631
0.0364
# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 3.9796 4.3400 4.3796 4.3969 4.4057 4.4146 4.4146 4.4146 "google_ro_constrained_primary" BLEU: 0.6949 0.6117 0.5294 0.4508 0.3871 0.3307 "google_ro_constrained_primary" MT evaluation scorer ended on 2009 Jan 15 at 14:31:43
SEE-ERA.NET Pilot Joint Call
0.2743
Project ID: 06-1000031-10503
0.2282
9-gram -----4.4146
0.1861
MT evaluation scorer began on 2009 Jan 15 at 14:34:58 command line: ./mteval-v12.pl -r ref_en.xml -s src_gr.xml -t test_gr_en.xml Evaluation of gr-to-en translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 3.5427
BLEU score = 0.2868 for system "google_en_constrained_primary"
# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 3.1278 0.3322 0.0736 "google_en_constrained_primary" BLEU: 0.5929 0.3482 0.2252 "google_en_constrained_primary"
4-gram -----0.0091
5-gram -----0.0000
6-gram -----0.0000
7-gram -----0.0000
8-gram -----0.0000
9-gram -----0.0000
0.1455
0.0917
0.0463
0.0280
0.0094
0.0000
# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 3.1278 3.4600 3.5336 3.5427 3.5427 3.5427 3.5427 3.5427 "google_en_constrained_primary" BLEU: 0.5929 0.4544 0.3596 0.2868 0.2283 0.1750 "google_en_constrained_primary" MT evaluation scorer ended on 2009 Jan 15 at 14:34:58 SEE-ERA.NET Pilot Joint Call
0.1347
Project ID: 06-1000031-10503
0.0966
9-gram -----3.5427
0.0000
MT evaluation scorer began on 2009 Jan 15 at 14:36:59 command line: ./mteval-v12.pl -r ref_en.xml -s src_sl.xml -t test_sl_en.xml Evaluation of sl-to-en translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 4.0424
BLEU score = 0.2215 for system "google_en_constrained_primary"
# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 3.6381 0.3313 0.0634 "google_en_constrained_primary" BLEU: 0.6827 0.3301 0.1765 "google_en_constrained_primary"
4-gram -----0.0097
5-gram -----0.0000
6-gram -----0.0000
7-gram -----0.0000
8-gram -----0.0000
9-gram -----0.0000
0.0792
0.0400
0.0202
0.0102
0.0000
0.0000
# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 3.6381 3.9693 4.0327 4.0424 4.0424 4.0424 4.0424 4.0424 "google_en_constrained_primary" BLEU: 0.6383 0.4438 0.3191 0.2215 0.1552 0.1092 "google_en_constrained_primary" MT evaluation scorer ended on 2009 Jan 15 at 14:36:59 SEE-ERA.NET Pilot Joint Call
0.0771
Project ID: 06-1000031-10503
0.0000
9-gram -----4.0424
0.0000
MT evaluation scorer began on 2009 Jan 15 at 14:33:55 command line: ./mteval-v12.pl -r ref_en.xml -s src_ro.xml -t test_ro_en.xml Evaluation of ro-to-en translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 4.3573
BLEU score = 0.2827 for system "google_en_constrained_primary"
# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 3.9133 0.3347 0.0797 "google_en_constrained_primary" BLEU: 0.7629 0.4271 0.2526 "google_en_constrained_primary"
4-gram -----0.0197
5-gram -----0.0100
6-gram -----0.0000
7-gram -----0.0000
8-gram -----0.0000
9-gram -----0.0000
0.1383
0.0860
0.0543
0.0330
0.0111
0.0000
# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 3.9133 4.2480 4.3277 4.3474 4.3573 4.3573 4.3573 4.3573 "google_en_constrained_primary" BLEU: 0.6604 0.4941 0.3765 0.2827 0.2165 0.1679 "google_en_constrained_primary" MT evaluation scorer ended on 2009 Jan 15 at 14:33:55 SEE-ERA.NET Pilot Joint Call
0.1303
Project ID: 06-1000031-10503
0.0941
9-gram -----4.3573
0.0000
MT evaluation scorer began on 2009 Jan 13 at 21:40:13 command line: ./mteval-v12.pl -r ref_gr.xml -s src_en.xml -t test_en_gr.xml Evaluation of en-to-gr translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 3.9730
BLEU score = 0.3533 for system "moses_gr_constrained_primary"
# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 3.5693 0.2813 0.0871 "moses_gr_constrained_primary" BLEU: 0.6372 0.4375 0.3243 "moses_gr_constrained_primary"
4-gram -----0.0265
5-gram -----0.0089
6-gram -----0.0090
7-gram -----0.0091
8-gram -----0.0000
9-gram -----0.0000
0.2455
0.1835
0.1389
0.1028
0.0660
0.0381
# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 3.5693 3.8506 3.9377 3.9641 3.9730 3.9820 3.9911 3.9911 "moses_gr_constrained_primary" BLEU: 0.5832 0.4833 0.4108 0.3533 0.3045 0.2632 "moses_gr_constrained_primary" MT evaluation scorer ended on 2009 Jan 13 at 21:40:13 SEE-ERA.NET Pilot Joint Call
0.2272
Project ID: 06-1000031-10503
0.1926
9-gram -----3.9911
0.1593
MT evaluation scorer began on 2009 Jan 13 at 21:45:57 command line: ./mteval-v12.pl -r ref_sl.xml -s src_en.xml -t test_en_sl.xml Evaluation of en-to-sl translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 3.6719
BLEU score = 0.2450 for system "moses_sl_constrained_primary"
# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 3.2759 0.3271 0.0690 "moses_sl_constrained_primary" BLEU: 0.5922 0.3235 0.1881 "moses_sl_constrained_primary"
4-gram -----0.0000
5-gram -----0.0000
6-gram -----0.0000
7-gram -----0.0000
8-gram -----0.0000
9-gram -----0.0000
0.1000
0.0606
0.0408
0.0309
0.0208
0.0105
# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 3.2759 3.6030 3.6719 3.6719 3.6719 3.6719 3.6719 3.6719 "moses_sl_constrained_primary" BLEU: 0.5922 0.4377 0.3303 0.2450 0.1853 0.1440 "moses_sl_constrained_primary" MT evaluation scorer ended on 2009 Jan 13 at 21:45:57
SEE-ERA.NET Pilot Joint Call
0.1156
Project ID: 06-1000031-10503
0.0933
9-gram -----3.6719
0.0732
MT evaluation scorer began on 2009 Jan 15 at 14:18:18 command line: ./mteval-v12.pl -r ref_ro.xml -s src_en.xml -t test_en_ro.xml Evaluation of en-to-ro translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 4.9923
BLEU score = 0.5634 for system "moses_ro_constrained_primary"
# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 4.5353 0.3713 0.0555 "moses_ro_constrained_primary" BLEU: 0.7843 0.6238 0.5300 "moses_ro_constrained_primary"
4-gram -----0.0201
5-gram -----0.0101
6-gram -----0.0102
7-gram -----0.0000
8-gram -----0.0000
9-gram -----0.0000
0.4545
0.3878
0.3196
0.2708
0.2211
0.1809
# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 4.5353 4.9066 4.9621 4.9822 4.9923 5.0026 5.0026 5.0026 "moses_ro_constrained_primary" BLEU: 0.7542 0.6725 0.6131 0.5634 0.5187 0.4754 "moses_ro_constrained_primary" MT evaluation scorer ended on 2009 Jan 15 at 14:18:18 SEE-ERA.NET Pilot Joint Call
0.4362
Project ID: 06-1000031-10503
0.3987
9-gram -----5.0026
0.3636
MT evaluation scorer began on 2009 Jan 15 at 14:28:04 command line: ./mteval-v12.pl -r ref_en.xml -s src_gr.xml -t test_gr_en.xml Evaluation of gr-to-en translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 3.8036
BLEU score = 0.3005 for system "moses_en_constrained_primary"
# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 3.3365 0.3760 0.0664 "moses_en_constrained_primary" BLEU: 0.6320 0.3710 0.2358 "moses_en_constrained_primary"
4-gram -----0.0246
5-gram -----0.0000
6-gram -----0.0000
7-gram -----0.0000
8-gram -----0.0000
9-gram -----0.0000
0.1475
0.0909
0.0417
0.0252
0.0169
0.0085
# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 3.3365 3.7125 3.7790 3.8036 3.8036 3.8036 3.8036 3.8036 "moses_en_constrained_primary" BLEU: 0.6320 0.4842 0.3809 0.3005 0.2366 0.1771 "moses_en_constrained_primary" MT evaluation scorer ended on 2009 Jan 15 at 14:28:05 SEE-ERA.NET Pilot Joint Call
0.1341
Project ID: 06-1000031-10503
0.1035
9-gram -----3.8036
0.0785
MT evaluation scorer began on 2009 Jan 15 at 14:29:01 command line: ./mteval-v12.pl -r ref_en.xml -s src_sl.xml -t test_sl_en.xml Evaluation of sl-to-en translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 4.0589
BLEU score = 0.2293 for system "moses_en_constrained_primary"
# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 3.6350 0.3516 0.0541 "moses_en_constrained_primary" BLEU: 0.6726 0.3304 0.1712 "moses_en_constrained_primary"
4-gram -----0.0182
5-gram -----0.0000
6-gram -----0.0000
7-gram -----0.0000
8-gram -----0.0000
9-gram -----0.0000
0.0727
0.0275
0.0093
0.0000
0.0000
0.0000
# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 3.6350 3.9866 4.0407 4.0589 4.0589 4.0589 4.0589 4.0589 "moses_en_constrained_primary" BLEU: 0.6726 0.4714 0.3363 0.2293 0.1501 0.0943 "moses_en_constrained_primary" MT evaluation scorer ended on 2009 Jan 15 at 14:29:01 SEE-ERA.NET Pilot Joint Call
0.0000
Project ID: 06-1000031-10503
0.0000
9-gram -----4.0589
0.0000
MT evaluation scorer began on 2009 Jan 15 at 14:30:14 command line: ./mteval-v12.pl -r ref_en.xml -s src_ro.xml -t test_ro_en.xml Evaluation of ro-to-en translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 4.6191
BLEU score = 0.4682 for system "moses_en_constrained_primary"
# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 3.9877 0.5150 0.0819 "moses_en_constrained_primary" BLEU: 0.7311 0.5424 0.4017 "moses_en_constrained_primary"
4-gram -----0.0259
5-gram -----0.0087
6-gram -----0.0000
7-gram -----0.0000
8-gram -----0.0000
9-gram -----0.0000
0.3017
0.2261
0.1667
0.1239
0.0893
0.0721
# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 3.9877 4.5026 4.5845 4.6104 4.6191 4.6191 4.6191 4.6191 "moses_en_constrained_primary" BLEU: 0.7311 0.6297 0.5421 0.4682 0.4048 0.3491 "moses_en_constrained_primary" MT evaluation scorer ended on 2009 Jan 15 at 14:30:15 SEE-ERA.NET Pilot Joint Call
0.3011
Project ID: 06-1000031-10503
0.2587
9-gram -----4.6191
0.2244