Building Language Resources and Translation Models for ... - CiteSeerX

Building Language Resources and Translation Models for Machine Translation focused on South Slavic and Balkan Languages 1

2

3

4

5

Dan Tufiş , Svetla Koeva , Tomaž Erjavec , Maria Gavrilidou , Cvetana Krstev 1

RACAI-Research Institute for Artificial Intelligence, Romanian Academy, 13, Calea 13 Septembrie, 050711, Bucharest, Romania, [email protected] 2 IBL-Institute for Bulgarian Language, Bulgarian Academy, 52, Shipchenski prohod, 1113, Sofia Bulgaria, [email protected] 3 JSI-Jožef Stefan Institute, Jamova cesta 39, SI-1000, Ljubljana, Slovenia, [email protected] 4 ILSP-Institute for Language and Speech Processing, 6, Artemidos, GR15125, Marousi, Greece, [email protected] 5 UB-University of Belgrade, 16, Studentski trg, 11000, Belgrade, Serbia,[email protected]

Abstract We describe the results of a short-term SEEERAnet project the aim of which was to investigate the feasibility of machine translation (MT) research and development for several South Slavic and Balkan languages, more precisely Romanian, Bulgarian, Slovene, Greek and Serbian. For these languages MT systems are scarce and for some of them even non-existent. We provide a brief description of the major research tasks of the project: compilation of a multilingual parallel corpus for the concerned languages, the XML mark-up of the corpus (tokenization, lemmatization, tagging), the sentence and word alignment of the corpus and the building of the statistical translation models. Additionally, based on the created resources and models we conducted preliminary experiments on building prototype MT systems for Romanian-English, Greek-English and Slovene-English. In the concluding section we argue that by investing further efforts in extending the language resources we created as well as in the fine-tuning of the statistical parameters, the current machine-learning technologies can be successfully used for a quick development of acceptable MT prototypes, valuable starting points in implementing working systems.

Thematic Area/Type of the project Information and Communication Technologies: Applications Research; Research project;

Keywords alignment, corpus encoding, extensible mark-up language (XML), machine translation, Moses toolkit, language model, lemmatization, parallel corpora, tagging, tokenization, translation model.

1.

Introduction

Since the seminal work of the IBM group in statistical word-based translation [1], new methodologies (memory-based, phrased-based, syntax-based etc.) and techniques (reification, factorization) emerged in multilingual data-driven approaches to machine translation. Yet, several studies underlined the idea that the quality of data to be fed into any machine learning system is of a crucial importance and cannot be compensated by using mass raw multilingual data. In spite of numerous attempts to construct MT systems entirely based on raw parallel data, the evaluations showed that

SEE-ERA.NET Pilot Joint Call

Project ID: 06-1000031-10503

although useful and encouraging results can be obtained in a short period of time, the translation quality can hardly be further improved by increasing the volume of data. The ongoing EuroMatrix project started from this finding and adopted a very promising hybrid approach, combining the strength of rule-based and statistical machine translation and exploiting more and more linguistic knowledge. The Factored Translation Models [2] allow for exploiting, where available, different levels of linguistic pre-processing: lemmatization, part-of-speech tagging, chunking, parsing, word-sense disambiguation, etc. For most of European languages there exist already tools for ensuring the basic pre-processing steps required for a factored translation approach. In fact, with current MT technologies [3], [4], [5] which, to a large extent, are language independent, the development of large enough and high quality training data became the critical part of an MT development project. In this paper we report on the main results of a small and short-term SEE-ERA.net project1, the main objective of which was to provide necessary linguistic and technological resources that will foster machine translation RTD for South Slavic and Balkan languages. The partners in the project were from Bulgaria, Greece, Romania, Serbia and Slovenia. Some partners harmonized the objectives of this project with the objectives of other local or bilateral running projects and the project thus included Czech, French and German as additional languages.

2.

The Multilingual Data

The Acquis Communautaire is the total body of European Union (EU) law applicable in the EU Member States. This collection of legislative text changes continuously and currently comprises texts written between the 1950s and 2008 in all the languages of EU Member States. Thus, the Acquis Communautaire is a collection of parallel texts in the following 22 languages: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish. A significant part of these parallel texts have been compiled by the Language Technology group of the European Commission's Joint Research Centre at Ispra into an aligned parallel corpus, called JRC-Acquis [6], publicly released in May 2006. In November 2007, the European Commission’s Directorate General for Translation (DGT) and the Joint Research Centre (JRC) have made available a multilingual Translation Memory (DGT-TM) of the Acquis Communautaire in the above mentioned official European Union languages. These unique language resources2 are among the few available parallel corpora containing the languages we were interested in: Bulgarian, Greek, Romanian, Slovene plus Czech, English, French, and German (called further SEE-ERA.net Administrative Corpus - SEnAC). This resource does not yet exist for Serbian, and for that reason an additional resource, based on Jules Verne's novel "Around the world in 80 days" (called further SEE-ERA.net Literary Corpus SEnLC), has been compiled.

2.1. SEnAC Corpus Construction and Encoding From the entire JRC-Acquis, which uses the same identifiers (Celex numbers) for the same documents (trailed with the language code); we selected all the documents existing in all our target languages. This resulted in a list of 1204 files per language3. Since we have noticed several errors in the sentence alignments of the original JRC-Acquis corpus, we re-aligned the 1204 files for Bulgarian, Czech, French, Greek, German, Romanian, Slovene against the corresponding files in English, using

1 2 3

http://dcl.bas.bg/ssbc/home.html http://langtech.jrc.it/JRC-Acquis.html We used the JRC-Acquis version 2. In the last version of JRC-Acquis the number of common files for the considered 9 languages is more than twice larger.


Project ID: 06-1000031-10503

RACAI's SVM sentence aligner [7]. From the XX-EN aligned sentences, we retained only the 1-1 alignment pairs (more than 99% on average of the total alignments) and each partner had the responsibility to check and correct, if necessary, the sentence alignment. We are not aware of any alignment error for the retained 1-1 XX-EN sentences. Finally we merged the alignments into one XML document, containing 60,389 translation units, each containing one sentence translated in 8 languages, as exemplified in Figure 1.

Резултатите от консултациите и информацията , събрана съгласно членове 5,6 и 7 , трябва да се вземат предвид при процедурата по издаването на разрешението. Informace shromážděné podle článků 5 , 6 a 7 musí být brány v úvahu v povolovacím řízení. Die gemäß den Artikeln 5 , 6 und 7 eingeholten Angaben sind im Rahmen des Genehmigungsverfahrens zu berücksichtigen. Οι πληροφορίες που συγκεντρώνονται δυνάµει των άρθρων 5 , 6 και 7 πρέπει να λαµβάνονται υπόψη στα πλαίσια της διαδικασίας για τη χορήγηση αδείας. Information gathered pursuant to Articles 5 , 6 and 7 must be taken into consideration in the development consent procedure. Les informations recueillies conformément aux articles 5 , 6 et 7 doivent être prises en considération dans le cadre de la procédure d'autorisation. Informaţiile culese conform art. 5 , 6 şi 7 trebuie să fie luate în considerare în cadrul procedurii de autorizare. Informacije , zbrane skladno s členi 5 , 6 in 7 , se morajo upoštevati v postopku za pridobitev soglasja za izvedbo.

Figure 1: A translation unit from the 8-language SEnAC parallel corpus

2.2. SEnLC Corpus Construction and Encoding One reason that we have chosen Jules Verne’s novel is that this text is available in digital form for many of the languages that we were interested in. Moreover, for the majority of these languages lexical resources exist in the same format, which enables comparable processing of the text in different languages. Translations of the novel in sixteen languages have been acquired, namely: French, English, German, Spanish, Portuguese, Italian, Romanian, Russian, Serbian, Croatian, Bulgarian, Macedonian, Polish, Slovenian, Hungarian and Greek. Not all of these texts have yet been aligned; alignment was done for the five Balkan languages, French original and English. In the preparatory phase each translation was marked in accordance with the TEI-standard in XML, and the title (), paragraph (
) and “sentence” () were included as units of text logical layout. Before alignment, each text was transformed to the TEI-conformant format4. The XAlign system5 was used for the alignment process. Starting from the French version, the goal of the alignment was to establish 1:1 relations on the segment level ( tag) with all other languages. In order to achieve this goal segments had to be further divided. So, the total number of segments in all texts is 4409, and the average number of words per language is about 60,000.

4 http://www.tei-c.org/index.xml 5 http://led.loria.fr/download/source/Xalign.zip


Project ID: 06-1000031-10503

As this corpus is very small for MT studies (it is about 25 times smaller than SEnAC) it requires a significant extension6. This type of text alignment of bitexts required an intensive manual control of the output of the XAlign system. In this way, the missing segments or the inconsistencies between the source text and its translations were also identified. Vous savez que cette formalité du visa est inutile, et que nous n'exigeons plus la présentation du passeport? Vi znate da je ova formalnost viziranja izlišna i da se više ne traži pokazivanje isprava? Знаете ли, че тази формалност с паспортите е безполезна и че ние вече не изискваме да представяте паспортите си? You know that a visa is useless, and that no passport is required? Ξέρετε ότι αυτή η τυπική διαδικασία της βίζας δεν είναι αναγκαία και δεν απαιτείται πλέον η εµφάνιση του διαβατηρίου; Ali vam je znano, da je ta formalnost vidiranja nepotrebna in da ne zahtevamo več predložitve potnega lista ? Ştiţi cã formalitatea vizei e inutilă şi că noi nu mai cerem prezentarea paşaportului.

Figure 2: A translation unit from the 7-language SEnLC parallel corpus

3.

Sub-sentential annotation of the multilingual data

Each project partner took care about the tokenization, morpho-syntactic tagging and lemmatisation of the texts in their own languages, using in-house or public-domain processing tools (adapted for the new languages). For instance, Romanian, English and French texts were processed based on the RACAI tools [8] integrated into the linguistic web-service platform available at http://nlp.racai.ro/ webservices/. The German data was processed using Helmut Schmid's TreeTagger (this tagger has been successfully used for German, English, French, Italian, Dutch, Spanish, Bulgarian, Russian, Greek, Portuguese, Chinese and old French texts and it is available at http://www.ims.unistuttgart.de/projekte/corplex/TreeTagger/). The pre-processing of Czech part of the corpus was kindly provided by Aleš Horák from Faculty of Informatics at Masaryk University. After tokenization, tagging and lemmatization, this annotation was added to the XML encoding of the parallel corpus. Depending on the available processing tools for different languages, additional information could be added to each language-specific segment of a translation unit. Figure 3 shows the representation of the Romanian segment of the translation unit displayed in Figure 1. ...

6 The "1984" corpus, the encoding of which is available in exactly the same format, is available at the address: http://nl.ijs.si/ME. Although

the average number of tokens in each language of the "1984" corpus is 110,000, the joined SEnLC and "1984" still makes a too small parallel corpus for MT experiments.


Project ID: 06-1000031-10503

Informaţiile culese conform art. 5 ,6 şi 7 trebuie să fie luate în considerare în cadrul procedurii de autorizare . ...

Figure 3: Linguistically analysed sentence (Romanian) of a translation unit of the SEnAC parallel corpus The tagsets used for all languages (except Bulgarian and German) were compliant with the MULTEXT specifications, for the most part with the MULTEXT-East specifications Version 3 [9] (for the details of the morpho-syntactic annotation, see http://nl.ijs.si/ME/V3/msd/). The Table 1 shows some statistics concerning the result of the pre-processed corpus: Language

No. of tokens

Avg no. of tokens/sentence

BG

1436925

23.79

CS

1238981

20.51

DE

1314441

21.76

EL

1469642

24.33

EN

1466912

24.29

FR

1527241

25.29

RO

1422995

23.56

SL

1271011

21.04

Table 1: Statistical data on the SEnAC parallel corpus The corpus SEnLC was tokenized, lemmatized and tagged according to the same principles as used in the

SEnAC encoding. The total number of tokens in this text in French is 71,793, while the total number of unique tokens (types) is 9,433 (ratio 7.6). The figures for other languages are different, e.g. for Serbian the total number of tokens is 58,722, while the total number of types is 12,733 (ratio 4.6), for Bulgarian the total number of tokens is 58,678, while the total number of types is 11,217 (ratio 5.2), while for Greek the total number tokens is 68,615, and the total number of types is 11,809 (ratio 5.8). Figure 4 shows the representation of the Serbian segment of the translation unit displayed in Figure 2: Vi znate da je ova formalnost


Project ID: 06-1000031-10503

viziranja izlišna i da se više ne traži pokazivanje isprava ?

Figure 4: Linguistically analysed sentence (Serbian) of a translation unit of the SEnLC parallel corpus

4.

Word Alignment of SEnAC

Based on the monolingual data from the SEnAC we built language models for each language. For Romanian we used the TTL tagging modeller [11] while for the other languages we used the METT tagging modeller [12]. Both systems for language modelling and tagging are able to perform tiered tagging [13], a morpho-syntactic disambiguation method that was specially designed to work with large (lexicon) tagsets. In order to build the translation models from the linguistically analysed parallel corpus we used GIZA++ [3] and constructed 8 unidirectional translation models (EN-RO, RO-EN, EN-BG, BG-EN, ENSL, SL-EN, EN-GR, GR-EN). The processing unit considered in each language was not the wordform but the string formed by its lemma and the first two characters of the associated morphosyntactic tag (e.g. for the wordform "informaţiile" we took the item "informaţie/Nc"). We used for each language 20 iterations (5 for Model 1, 5 for HMM, 1 for THTo3, 4 for Model3, 1 for T2To4 and 4 for Model4). We did not include Model 5 nor Model 6 as we noticed a degradation of the perplexities. Given the formulaic language used by the Acquis-Communautaire documents, the perplexities of the resulting translation models were encouraging, and range from 13.07 (RO-EN) to 19.88 (EN-BG). Based on these models we word-aligned the bitexts using the iterative high precision COWAL reified aligner [14]. The reified alignment method allows one to combine different information sources to guide the identification of the highest probability translation pairs in a bitext. In our alignment system we used among others translation probabilities, string similarity scores, translation entropy, word positions, etc. Eight such information sources were combined by means of a linear interpolation formula into a scoring function used to build the most probable lexical alignment. As described in [14], the translation pairs prescribed by each unidirectional translation model were unconditionally included in the alignment skeleton. The rest of the links were established in the subsequent iterations of the aligner. The training corpora SEnAC, the alignments and the perplexities for each translation model are available at http://www.racai.ro/ResearchActivity/WebServicesandResources/SEEERANETResources/tabid/131/D efault.aspx. An alignment viewer and editor (see Appendices) is also available for visualization and correction of the alignments with the purpose of further fine-tuning the translation models. Another useful tool available at the mentioned address, implemented by the Slovene partner, builds N way alignments from existing pair wise alignments, either sentence or word. More specifically, when given n pair wise alignments, such that one of the corpora in the alignment is the same for all of the alignments, and the other is different for all of the alignments, it produces n+1 way alignment. The corpus which is included in all of the alignments is called the hub corpus. The time complexity of the algorithm is O(S*A*C), where S is the number of sentences in the hub corpus, A is the number of input alignments used and C is the average number of corpora per alignment. In the standard scenario of using only pair wise alignments C = 2. The space complexity of the data structure for the mapping SL


Project ID: 06-1000031-10503

is O(C*S), where C is the total number of corpora in the result alignment and S is the average number of sentences per corpus. This method, which exploits the translation equivalence transitivity, has many advantages versus the direct X1-X2 alignment: it allows a multilingual team to share the work so that different partners deal only with known pairs of languages (in our case EN-Xi); having a good command of a given language pair they can check and correct the alignments of a bitext extracted from a multilingual parallel text; the derived alignments Xi-Xj are usually of comparable accuracy as the alignment of PIVOT-Xi and PIVOT-Xj the derived alignment is much faster than the direct alignment the derived alignment is much cheaper in terms of human expertise, language resources, and computing power

5.

Using MOSES Toolkit to Perform Machine Translation

The MOSES toolkit [5] is a public domain environment, which was developed in the ongoing European project EUROMATRIX, and allows for rapid prototyping of Statistical Machine Translation systems. It assists the developer in constructing the language and translation models for the languages he/she is concerned with and by its advanced factored decoder and control system ensures the solving of the fundamental equation of the Statistical Machine Translation in a noise-channel model: Target* = argmaxTarget P(Source)*P(Target) What is extremely useful is that the environment allows a developer to provide MOSES with language and translation models externally developed, offering means to ensure the conversion of the necessary data structures into the expected format and further improve them. Once the statistical models are in the prescribed format, the MT system developer may define his/her own factoring strategy. If the information is provided, MOSES can build various factored representations for each of the lexical items (be they word or phrases) to be used in deriving the best translation: occurrence form, lemmatized form, associated part-of-speech or morpho-syntactic tag. By dissociating the treatment of occurrence form of lexical items into distinct processes (translating lemma, translating morpho-syntactic properties of the current item and respectively generating the target inflected lexical item based on the translated lemma and the translated morphological input information) the system achieves a higher flexibility, a better generalization of the linguistic facts, more reliable decisions in the quest for the optimal translation solution. Moreover, the system allows for integration of higher order information (shallow or even deep parsing information) in order to improve the output lexical items reordering. For further details on the Moses Toolkit for Statistical Machine Translation and its tuning, the reader is directed to the EUROMATRIX project web-page http://www.euromatrix.net/ and to the download web-page http://www.statmt.org/moses/.

6.

Conclusions

The described language resources as well as the MOSES toolkit were used at RACAI to perform a series of very encouraging MT experiments for part of the language pairs of the project, using always English as one of the source or target language (see the Appendices). Due to the limited period of the project and of the human resources, as well as due to insufficient training data for Serbian, not all the language pairs of the project have been experimented. However, even the few experiments that were made proved that, whenever adequate language resources are available, the development of reasonably accurate SMT systems for the concerned languages is realistic and feasible within acceptable time limits. The extremely valuable language resources that were created during this project (they were carefully analysed and manually corrected where necessary) as well the public available tools offer to all the interested parties the possibility to further experiment with new language


Project ID: 06-1000031-10503

pairs. All the partners in this project are working on research proposals or are already involved in ongoing projects that will definitely build on the results of this SEEERAnet project. Future work could address the more challenging task on extending the SEnLC literary corpus and building translation models and, provided adequate data will be available, experiments with other South East and Balkan languages.

7. [1]

References Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., Mercer, R. J. (2003). The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2), pp. 263–311.

[2]

Koehn P., and Hoang, H. (2007). Factored Translation Models. Conference on Empirical Methods in Natural Language Processing, pp. 868-876.

[3]

Och, F. J., Ney H. (2000). Improved Statistical Alignment Models. In Proceedings of the 38th Conference of ACL, Hong Kong, pp. 440-447

[4]

Och, F. J., Ney H. (2003). A Systematic Comparison of Various Statistical Alignment Models, Computational Linguistics, 29(1), pp. 19-51.

[5]

Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Wade, S., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E. (2007). Moses: Open Source Toolkit for Statistical Machine Translation. Annual Meeting of the Association for Computational Linguistics (ACL), demonstration session, Prague, Czech Republic.

[6]

Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufiş, D. (2006). The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of the 5th LREC Conference, Genoa, Italy, 22-28 May, pp. 2142-2147

[7]

Ceauşu, A., Ştefănescu, D., Tufiş, D. (2006). Acquis Communautaire sentence alignment using Support

[8]

Tufiş, D., Ion, R., Ceauşu, A., Ştefănescu, D. (2008). RACAI's Linguistic Web Services. In Proceedings

Vector Machines. In Proceedings of the 5th LREC Conference, Genoa, Italy. of the 6th Language Resources and Evaluation Conference - LREC 2008, Marrakech, Morocco. ELRA European Language Ressources Association. ISBN 2-9517408-4-0. [9]

Erjavec, T. (2004). MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora. In: Proc. of the Fourth Intl. Conf. on Language Resources and Evaluation, LREC'04, ELRA, Paris, pp. 1535 - 1538.

[10]

Krstev, C., Vitas, D., Erjavec, T. (2004). Morpho-Syntactic Descriptions in MULTEXT-East - the case of Serbian, in Informatica, No. 28, The Slovene Society Informatika, Ljubljana, pp. 431-436

[11]

Ion, R. (2007). Word Sense Disambiguation Methods Applied to English and Romanian, PhD thesis (in Romanian), Romanian Academy, Bucharest, 138 p.

[12]

Alexandru Ceausu (2006). Maximum Entropy Tiered Tagging. In Janneke Huitink & Sophia Katrenko (editors), Proceedings of the Eleventh ESSLLI Student Session, pp. 173-179

[13]

Tufiş, D.(1999). Tiered Tagging and Combined Language Models Classifiers. In Václav Matousek, Pavel Mautner, Jana Ocelíková, and Petr Sojka, editors, Text, Speech and Dialogue (TSD 1999), Lecture Notes in Artificial Intelligence 1692, Springer Berlin / Heidelberg,. ISBN 978-3-540-66494-9, pp. 28-33.

[14]

Tufiş, D., Ion, R., Ceauşu, A., Ştefănescu, D. (2006). Improved Lexical Alignment by Combining Multiple Reified Alignments. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy, 3-7 April, pp. 153-160

[15]

Tufiş, D., Koeva S. (2007). Ontology-supported Text Classification based on Cross-lingual Word Sense Disambiguation. In Francesco Masulli, Sushmita Mitra and Gabriella Pasi (eds.). Applications of Fuzzy Sets Theory. 7th International Workshop on Fuzzy Logic and Applications, WILF, Camogli, Italy, LNAI 4578, Springer-Verlag Berlin Heidelberg, pp. 447-455


Project ID: 06-1000031-10503

8.

Appendices

Snapshot of the alignment visualization of a translation unit (no. 69) in four bitexts

9.

Examples of Machine Translation Experiments

In this section we present translation examples obtained by using the language and translation models built during the project which were fed into the MOSES language independent translation engine. A number of 500 translation units of the SEnAC were left out from the training process which created the language models and the translation models. They were used for testing purposes. Below, there are original sentences of one translation unit out of the 500 test units. These original (human) translations are the references for the machine translations (in both directions) for the EN-RO, EN-SL and EN-EL pairs of languages. The tables that follows display the translations produced by our MT prototypes and by Google's MT systems. We lowered the case in both reference and automatic translations. English Original after consulting the council , the commission shall fix the limits within which the agency may contract loans for a term not exceeding two years . for loans for a term exceeding two years the agency must obtain through the commission the approval of the council acting by a qualified majority in each individual case . the agency may raise loans on the financial markets in a member state subject to the


Project ID: 06-1000031-10503

provisions of law applying to internal loans or , if such provisions do not exist in a member state , after agreement has been reached between such member state and the agency in regard to the proposed loan . Romanian Original după consultarea consiliului , comisia stabileşte limitele în cadrul cărora agenţia poate contracta împrumuturi pe un termen de maxim doi ani . pentru împrumuturi pe termen mai mare de doi ani , agenţia trebuie să obţină , prin intermediul comisiei , aprobarea consiliului , care hotărăşte cu majoritate calificată în fiecare caz în parte . agenţia poate contracta împrumuturi pe pieţele financiare ale unui stat membru , sub rezerva dispoziţiilor legale aplicabile împrumuturilor interne , sau , dacă aceste dispoziţii nu există într-un stat membru , în momentul în care statul membru şi agenţia ajung la un acord cu privire la împrumutul propus . Slovene Original komisija po posvetovanju s svetom določi meje , v katerih se agencija lahko dogovarja o posojilih z rokom vračila , ki ne presegajo dveh let . za posojila z rokom vračila , ki presega dve leti , mora agencija prek komisije pridobiti odobritev sveta , ki o vsakem primeru odloča s kvalificirano večino . posojila na finančnih trgih države članice lahko agencija najema po pravnih predpisih , ki veljajo za interna posojila , ali če v državi članici takšnih predpisov ni , po sporazumu , ki sta ga o predlaganem posojilu dosegli država članica in agencija . Greek Original η επιτροπή , µετά από γνώµη του συµβουλίου , ορίζει τα όρια εντός των οποίων ο οργανισµός δύναται να συνάπτει δάνεια , των οποίων η διάρκεια δεν υπερβαίνει τα δύο έτη . για τα δάνεια που υπερβαίνουν τα δύο έτη , ο οργανισµός πρέπει να ζητεί δια της επιτροπής την έγκριση του συµβουλίου , που αποφασίζει µε ειδική πλειοψηφία χωριστά για κάθε περίπτωση . ο οργανισµός δύναται να συνάπτει δάνεια στις κεφαλαιαγορές κράτους µέλους , εντός του πλαισίου των νοµοθετικών διατάξεων που ισχύουν για δανεισµούς στο εσωτερικό ή , εφόσον δεν υφίστανται τοιούτου είδους διατάξεις στο κράτος µέλος , εφ ' όσον το κράτος µέλος αυτό και ο οργανισµός συνενοηθούν και συµφωνήσουν ως προς το δάνειο που προτίθεται να συνάψει ο οργανισµός .

Translation from English into Romanian Our translation

Google translation

după consultarea consiliului , comisia stabileşte limitele în care agenţia poate contracta împrumuturi pe o durată de maximum doi ani . pentru împrumuturi pentru un termen de maximum doi ani , agenţia trebuie să obţină prin comisia de aprobare a consiliului care hotărăşte cu majoritate calificată în fiecare caz în parte . agenţia poate contracta împrumuturi pe pieţele financiare într- un stat membru , sub rezerva dispoziţiilor legale aplicabile împrumuturilor interne sau , dacă aceste dispoziţii nu există într- un stat membru , după acord a fost atinsă între aceste state membre şi agenţia în materie la propuse împrumuturi .

după consultarea consiliului, comisiei, stabileşte limitele în care agenţia poate contracta împrumuturi pe un termen de cel mult doi ani. pentru împrumuturi pe un termen mai mare de doi ani, agenţia trebuie să obţină prin intermediul comisiei de aprobarea consiliului, hotărând cu majoritate calificată în fiecare caz individual. agenţia poate ridica de împrumuturi de pe pieţele financiare într-un stat membru, sub rezerva dispoziţiilor de aplicare a legii pentru a împrumuturilor interne, sau, dacă astfel de dispoziţii nu există într-un stat membru, după ce sa ajuns la un acord între astfel de stat membru şi în ceea ce priveşte agenţia propuse pentru a împrumutului.


Project ID: 06-1000031-10503

Translation from English into Slovene Our translation

Google translation

po posvetovanju s svet , komisija določi omejitve , v katerih se lahko agencija pogodbe za posojila , trajanje varstva , ki ne presega dveh let . za posojila za trajanje varstva , ki presega dveh let agencija mora pridobiti z komisije o odobritvi svetu predlog komisije s kvalificirano večino v vsakem posameznem primeru . agencija lahko dvigne posojila na finančnih trgih v državi članici , za katere veljajo določbe zakonodaje , ki veljajo za interna posojila , ali če te določbe ne obstajajo v državi članici , po sporazumu dosežen med temi države članice in agencija v zvezi s predlaganimi posojila .

po posvetovanju s svetom komisija določi omejitve, znotraj katerih lahko agencija najemanje posojil za dobo največ dveh let. za posojila za obdobje več kot dve leti, mora agencija prek komisije pridobiti odobritev sveta, ki odloča s kvalificirano večino v vsakem posameznem primeru. lahko agencija najemati posojil na finančnih trgih v državi članici v skladu z določbami notranjega prava, ki veljajo za posojila ali, če takšni predpisi ne obstajajo v državi članici, potem ko je bil dosežen sporazum med takšne države članice in agenciji v zvezi s k predlagani posojila.

Translation from English into Greek Our translation

Google translation

µετά τη γνώµη του συµβουλίου , η επιτροπή καθορίζει τα όρια εντός των οποίων ο οργανισµός δύναται να συνάπτει δάνεια για µια ένδειξη δεν υπερβαίνει τα δύο έτη . για δάνεια για µια ένδειξη υπερβαίνει τα δύο έτη , ο οργανισµός πρέπει να προέρχεται από την επιτροπή , την έγκριση του συµβουλίου που αποφασίζει µε ειδική πλειοψηφία σε κάθε µεµονωµένη περίπτωση . ο οργανισµός δύναται να συνάπτει δάνεια στις χρηµατοπιστωτικές αγορές σε ένα κράτος µέλος µε την επιφύλαξη των διατάξεων του δικαίου εφαρµόζεται στα εσωτερικά δάνεια ή , εάν οι διατάξεις δεν υφίστανται σε ένα κράτος µέλος , µετά από συµφωνία έχει συµπληρώσει µεταξύ αυτών των κρατών µελών και τον οργανισµό παρέµβασης σε µέριµνα για τα προτεινόµενα δάνεια .

µετά από διαβούλευση µε το συµβούλιο, η επιτροπή καθορίζει τα όρια εντός των οποίων ο οργανισµός δύναται να συνάπτει δάνεια για διάστηµα που δεν υπερβαίνει τα δύο έτη. για τα δάνεια για διάστηµα µεγαλύτερο των δύο ετών ο οργανισµός πρέπει να λάβει η επιτροπή µε την έγκριση του συµβουλίου µε ειδική πλειοψηφία, σε κάθε συγκεκριµένη περίπτωση. ο οργανισµός δύναται να συνάπτει δάνεια στις χρηµατοοικονοµικές αγορές σε ένα κράτος µέλος υπό την επιφύλαξη των διατάξεων του δικαίου που εφαρµόζεται στα εσωτερικά δάνεια ή, αν τέτοιες διατάξεις δεν υπάρχουν σε ένα κράτος µέλος, αφού έχει επιτευχθεί συµφωνία µεταξύ των εν λόγω κράτος µέλος και σε ό, τι αφορά τον οργανισµό µε την προτεινόµενη δανείου.

Translation from Romanian into English Our translation

Google translation

after consulting the council , the commission shall determine the limits within which the agency may raise loans on a period not exceeding two years . for the loan on the time more than two years , the agency must obtain , by means of the commission , the approval of the council , acting by a qualified majority in each individual case . the agency may raise loans on financial markets of a member state , subject to the legal provisions applicable to the internal loans or , if these provisions shall not exist in a member state , within the time in which the member states and the agency reach an agreement on the proposed loan .

after consulting the committee establishes limits within which the agency may contract loans on a maximum period of two years. for term loans of more than two years, the agency must get through committee, council approval, acting by a qualified majority in each case. agency may contract loans in the financial markets of a member state, subject to the laws applicable to domestic loans, or if there are no such provisions in a member state, when the member state and agency reach an agreement regarding the proposed loan .


Project ID: 06-1000031-10503

Translation from Slovene into English Our translation

Google translation

the commission after consultation with the council of the ceilings fixed in which the agency may contract loans with the deadline return which does not exceed two years . for the loan with the deadline return exceeding two years , the agency through the commission to obtain the approval of the council , acting on each case shall act by a qualified majority . loans on financial markets of the member state of the agency may raise the legal provisions applicable to internal loans , or if the member state of such provisions shall , under the agreements which are by the proposed loan achieved by the member states and the agency .

the commission, after consulting with the world determine the extent to which the agency can arrange loans with repayment date, not exceeding two years. for loans with repayment date, in excess of two years, the agency must obtain approval through the council, which on each case by a qualified majority. loans in the financial markets, member states, the agency may hire legal rules that apply to internal loans, or if the member state of such rules, under the agreement, which was reached on the proposed loan member state and the agency.

Translation from Greek into English Our translation

Google translation

the commission , after opinion of the council shall designate the limits within which the agency may contract loans , of which the period not exceeding two years . for the loans exceeding two years , the agency shall be requested by the commission of the approval of the council , acting by a qualified majority separately for each case . the agency may contract loans on the capital markets of a member state , within the framework of the legislative provisions in force for lending in the territory or , if there is no such species provisions in the member state , since of that member state and the agency συνενοηθούν and agreed as to the loans which intend to conclude the agency .

the committee, after obtaining the opinion of the board, defines the limits within which the agency may borrow, whose term not exceeding two years. for loans longer than two years, the agency must request by the approval of the committee of the board, acting by a qualified majority separately for each case. the agency may borrow on capital markets member state, within the framework of the laws that apply to loans within or, if there are no provisions toioutoy kind in the member state where the member state and the agency synenoithoun and agree on the loan which intends to conclude the agency.

9.1. Evaluations In the following we are providing NIST and BLEU evaluations using the last version of the NIST official mteval tool (ftp://jaguar.ncsl.nist.gov/mt/resources/mteval-v12.pl). Overall Statistics

Language pair English to Greek English to Slovene English to Romanian Greek to English Slovene to English Romanian to English


Google translation NIST score BLEU score 3.5705 3.5340 4.4057 3.5427 4.0424 4.3573

0.2934 0.2653 0.4508 0.2868 0.2215 0.2827

RACAI translation NIST score BLEU score 3.9730 3.6719 4.9923 3.8036 4.0589 4.6191

0.3533 0.2450 0.5634 0.3005 0.2293 0.4862

Project ID: 06-1000031-10503

MT evaluation scorer began on 2009 Jan 13 at 21:43:03 command line: ./mteval-v12.pl -r ref_gr.xml -s src_en.xml -t test_en_gr.xml Evaluation of en-to-gr translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 3.5705

BLEU score = 0.2934 for system "google_gr_constrained_primary"

# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 3.2557 0.2106 0.0778 "google_gr_constrained_primary" BLEU: 0.5826 0.3333 0.2566 "google_gr_constrained_primary"

4-gram -----0.0175

5-gram -----0.0088

6-gram -----0.0089

7-gram -----0.0090

8-gram -----0.0000

9-gram -----0.0000

0.1964

0.1532

0.1273

0.1009

0.0741

0.0467

# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 3.2557 3.4663 3.5442 3.5617 3.5705 3.5795 3.5885 3.5885 "google_gr_constrained_primary" BLEU: 0.5435 0.4111 0.3433 0.2934 0.2541 0.2238 "google_gr_constrained_primary" MT evaluation scorer ended on 2009 Jan 13 at 21:43:04 SEE-ERA.NET Pilot Joint Call

0.1978

Project ID: 06-1000031-10503

0.1734

9-gram -----3.5885

0.1487

MT evaluation scorer began on 2009 Jan 13 at 21:48:07 command line: ./mteval-v12.pl -r ref_sl.xml -s src_en.xml -t test_en_sl.xml Evaluation of en-to-sl translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 3.5340

BLEU score = 0.2653 for system "google_sl_constrained_primary"

# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 3.3472 0.1626 0.0242 "google_sl_constrained_primary" BLEU: 0.5816 0.2990 0.2083 "google_sl_constrained_primary"

4-gram -----0.0000

5-gram -----0.0000

6-gram -----0.0000

7-gram -----0.0000

8-gram -----0.0000

9-gram -----0.0000

0.1368

0.0957

0.0753

0.0652

0.0549

0.0444

# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 3.3472 3.5098 3.5340 3.5340 3.5340 3.5340 3.5340 3.5340 "google_sl_constrained_primary" BLEU: 0.5816 0.4170 0.3309 0.2653 0.2164 0.1815 "google_sl_constrained_primary" MT evaluation scorer ended on 2009 Jan 13 at 21:48:07


0.1568

Project ID: 06-1000031-10503

0.1375

9-gram -----3.5340

0.1213

MT evaluation scorer began on 2009 Jan 15 at 14:31:43 command line: ./mteval-v12.pl -r ref_ro.xml -s src_en.xml -t test_en_ro.xml Evaluation of en-to-ro translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 4.4057

BLEU score = 0.4508 for system "google_ro_constrained_primary"

# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 3.9796 0.3605 0.0395 "google_ro_constrained_primary" BLEU: 0.6949 0.5385 0.3966 "google_ro_constrained_primary"

4-gram -----0.0174

5-gram -----0.0088

6-gram -----0.0088

7-gram -----0.0000

8-gram -----0.0000

9-gram -----0.0000

0.2783

0.2105

0.1504

0.0893

0.0631

0.0364

# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 3.9796 4.3400 4.3796 4.3969 4.4057 4.4146 4.4146 4.4146 "google_ro_constrained_primary" BLEU: 0.6949 0.6117 0.5294 0.4508 0.3871 0.3307 "google_ro_constrained_primary" MT evaluation scorer ended on 2009 Jan 15 at 14:31:43


0.2743

Project ID: 06-1000031-10503

0.2282

9-gram -----4.4146

0.1861

MT evaluation scorer began on 2009 Jan 15 at 14:34:58 command line: ./mteval-v12.pl -r ref_en.xml -s src_gr.xml -t test_gr_en.xml Evaluation of gr-to-en translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 3.5427

BLEU score = 0.2868 for system "google_en_constrained_primary"

# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 3.1278 0.3322 0.0736 "google_en_constrained_primary" BLEU: 0.5929 0.3482 0.2252 "google_en_constrained_primary"

4-gram -----0.0091

5-gram -----0.0000

6-gram -----0.0000

7-gram -----0.0000

8-gram -----0.0000

9-gram -----0.0000

0.1455

0.0917

0.0463

0.0280

0.0094

0.0000

# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 3.1278 3.4600 3.5336 3.5427 3.5427 3.5427 3.5427 3.5427 "google_en_constrained_primary" BLEU: 0.5929 0.4544 0.3596 0.2868 0.2283 0.1750 "google_en_constrained_primary" MT evaluation scorer ended on 2009 Jan 15 at 14:34:58 SEE-ERA.NET Pilot Joint Call

0.1347

Project ID: 06-1000031-10503

0.0966

9-gram -----3.5427

0.0000

MT evaluation scorer began on 2009 Jan 15 at 14:36:59 command line: ./mteval-v12.pl -r ref_en.xml -s src_sl.xml -t test_sl_en.xml Evaluation of sl-to-en translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 4.0424



4-gram -----0.0097

5-gram -----0.0000

6-gram -----0.0000

7-gram -----0.0000

8-gram -----0.0000

9-gram -----0.0000

0.0792

0.0400

0.0202

0.0102

0.0000

0.0000


0.0771

Project ID: 06-1000031-10503

0.0000

9-gram -----4.0424

0.0000

MT evaluation scorer began on 2009 Jan 15 at 14:33:55 command line: ./mteval-v12.pl -r ref_en.xml -s src_ro.xml -t test_ro_en.xml Evaluation of ro-to-en translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 4.3573



4-gram -----0.0197

5-gram -----0.0100

6-gram -----0.0000

7-gram -----0.0000

8-gram -----0.0000

9-gram -----0.0000

0.1383

0.0860

0.0543

0.0330

0.0111

0.0000


0.1303

Project ID: 06-1000031-10503

0.0941

9-gram -----4.3573

0.0000

MT evaluation scorer began on 2009 Jan 13 at 21:40:13 command line: ./mteval-v12.pl -r ref_gr.xml -s src_en.xml -t test_en_gr.xml Evaluation of en-to-gr translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 3.9730

BLEU score = 0.3533 for system "moses_gr_constrained_primary"

# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 3.5693 0.2813 0.0871 "moses_gr_constrained_primary" BLEU: 0.6372 0.4375 0.3243 "moses_gr_constrained_primary"

4-gram -----0.0265

5-gram -----0.0089

6-gram -----0.0090

7-gram -----0.0091

8-gram -----0.0000

9-gram -----0.0000

0.2455

0.1835

0.1389

0.1028

0.0660

0.0381

# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 3.5693 3.8506 3.9377 3.9641 3.9730 3.9820 3.9911 3.9911 "moses_gr_constrained_primary" BLEU: 0.5832 0.4833 0.4108 0.3533 0.3045 0.2632 "moses_gr_constrained_primary" MT evaluation scorer ended on 2009 Jan 13 at 21:40:13 SEE-ERA.NET Pilot Joint Call

0.2272

Project ID: 06-1000031-10503

0.1926

9-gram -----3.9911

0.1593

MT evaluation scorer began on 2009 Jan 13 at 21:45:57 command line: ./mteval-v12.pl -r ref_sl.xml -s src_en.xml -t test_en_sl.xml Evaluation of en-to-sl translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 3.6719

BLEU score = 0.2450 for system "moses_sl_constrained_primary"

# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 3.2759 0.3271 0.0690 "moses_sl_constrained_primary" BLEU: 0.5922 0.3235 0.1881 "moses_sl_constrained_primary"

4-gram -----0.0000

5-gram -----0.0000

6-gram -----0.0000

7-gram -----0.0000

8-gram -----0.0000

9-gram -----0.0000

0.1000

0.0606

0.0408

0.0309

0.0208

0.0105

# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 3.2759 3.6030 3.6719 3.6719 3.6719 3.6719 3.6719 3.6719 "moses_sl_constrained_primary" BLEU: 0.5922 0.4377 0.3303 0.2450 0.1853 0.1440 "moses_sl_constrained_primary" MT evaluation scorer ended on 2009 Jan 13 at 21:45:57


0.1156

Project ID: 06-1000031-10503

0.0933

9-gram -----3.6719

0.0732

MT evaluation scorer began on 2009 Jan 15 at 14:18:18 command line: ./mteval-v12.pl -r ref_ro.xml -s src_en.xml -t test_en_ro.xml Evaluation of en-to-ro translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 4.9923

BLEU score = 0.5634 for system "moses_ro_constrained_primary"

# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 4.5353 0.3713 0.0555 "moses_ro_constrained_primary" BLEU: 0.7843 0.6238 0.5300 "moses_ro_constrained_primary"

4-gram -----0.0201

5-gram -----0.0101

6-gram -----0.0102

7-gram -----0.0000

8-gram -----0.0000

9-gram -----0.0000

0.4545

0.3878

0.3196

0.2708

0.2211

0.1809

# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 4.5353 4.9066 4.9621 4.9822 4.9923 5.0026 5.0026 5.0026 "moses_ro_constrained_primary" BLEU: 0.7542 0.6725 0.6131 0.5634 0.5187 0.4754 "moses_ro_constrained_primary" MT evaluation scorer ended on 2009 Jan 15 at 14:18:18 SEE-ERA.NET Pilot Joint Call

0.4362

Project ID: 06-1000031-10503

0.3987

9-gram -----5.0026

0.3636

MT evaluation scorer began on 2009 Jan 15 at 14:28:04 command line: ./mteval-v12.pl -r ref_en.xml -s src_gr.xml -t test_gr_en.xml Evaluation of gr-to-en translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 3.8036

BLEU score = 0.3005 for system "moses_en_constrained_primary"

# -----------------------------------------------------------------------Individual N-gram scoring 1-gram 2-gram 3-gram ---------------NIST: 3.3365 0.3760 0.0664 "moses_en_constrained_primary" BLEU: 0.6320 0.3710 0.2358 "moses_en_constrained_primary"

4-gram -----0.0246

5-gram -----0.0000

6-gram -----0.0000

7-gram -----0.0000

8-gram -----0.0000

9-gram -----0.0000

0.1475

0.0909

0.0417

0.0252

0.0169

0.0085

# -----------------------------------------------------------------------Cumulative N-gram scoring 1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram ----------------------------------------NIST: 3.3365 3.7125 3.7790 3.8036 3.8036 3.8036 3.8036 3.8036 "moses_en_constrained_primary" BLEU: 0.6320 0.4842 0.3809 0.3005 0.2366 0.1771 "moses_en_constrained_primary" MT evaluation scorer ended on 2009 Jan 15 at 14:28:05 SEE-ERA.NET Pilot Joint Call

0.1341

Project ID: 06-1000031-10503

0.1035

9-gram -----3.8036

0.0785

MT evaluation scorer began on 2009 Jan 15 at 14:29:01 command line: ./mteval-v12.pl -r ref_en.xml -s src_sl.xml -t test_sl_en.xml Evaluation of sl-to-en translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 4.0589



4-gram -----0.0182

5-gram -----0.0000

6-gram -----0.0000

7-gram -----0.0000

8-gram -----0.0000

9-gram -----0.0000

0.0727

0.0275

0.0093

0.0000

0.0000

0.0000


0.0000

Project ID: 06-1000031-10503

0.0000

9-gram -----4.0589

0.0000

MT evaluation scorer began on 2009 Jan 15 at 14:30:14 command line: ./mteval-v12.pl -r ref_en.xml -s src_ro.xml -t test_ro_en.xml Evaluation of ro-to-en translation using: src set "genmtevalfiles-set" (1 docs, 1 segs) ref set "genmtevalfiles-set" (1 refs) tst set "genmtevalfiles-set" (1 systems) NIST score = 4.6191



4-gram -----0.0259

5-gram -----0.0087

6-gram -----0.0000

7-gram -----0.0000

8-gram -----0.0000

9-gram -----0.0000

0.3017

0.2261

0.1667

0.1239

0.0893

0.0721


0.3011

Project ID: 06-1000031-10503

0.2587

9-gram -----4.6191

0.2244

Building Language Resources and Translation Models for ... - CiteSeerX

Building Language Resources and Translation Models for ... - CiteSeerX

Suggest Documents

Language Resources for Translation Work and ... - LREC Conferences

Building Language Resources for Exploring Autism ... - Language Log

Building Language Resources for Exploring Autism ... - Language Log

Large and Diverse Language Models for Statistical Machine Translation

Web Services for Language Resources and Language ... - CiteSeerX

Language Resources for Computer Assisted Translation from Italian to ...

Statistical Machine Translation with Local Language Models

PARAPHRASTIC LANGUAGE MODELS AND ... - CiteSeerX

The Language Translation Interface and automated ... - CiteSeerX

Language and Translation Model Adaptation using ... - CiteSeerX

Language resources for semantic document annotation ... - CiteSeerX

Lexical Resources for Natural Language Processing ... - CiteSeerX

Electronic Language Resources for POLISH: POLEX ... - CiteSeerX

ISOcat: remodelling metadata for language resources ... - CiteSeerX

translation and language. - Dialnet

Language Resources for Icelandic

Arabic Language Resources and Tools - CiteSeerX

Automated Translation And Thermal Zoning Of Digital Building Models ...

BUILDING TOPIC SPECIFIC LANGUAGE MODELS ... - SCUBA Lab

Church: a language for generative models - CiteSeerX

BUILDING MODELS FOR SOCIAL SPACE - CiteSeerX

Stochastic language models for speech recognition - CiteSeerX

Parsimonious Language Models for Information Retrieval - CiteSeerX

language models for automatic speech recognition - CiteSeerX