The Translation-oriented Annotation System: A

15 downloads 0 Views 26KB Size Report
Parallel Corpora: Creation and Applications ... cover a rather restricted set of source texts (see e.g. Kübler 2008, Espunya 2014, ... TAS (version 1.0) is made up of the following three parts: (1) ST-TT transfer: ... Lexis (Translated untranslatable, Untranslated translatable, Term translated by non-term, Non-term translated.
The Translation-oriented Annotation System: A tripartite annotation system for translation research Sylviane Granger & Marie-Aude Lefer UCLouvain [email protected], [email protected] Parallel Corpora: Creation and Applications International Symposium PaCor 2018 Madrid, November 5-7 2018 This paper aims at presenting the computer-aided Translation-oriented Annotation System (TAS) developed within the framework of the Multilingual Student Translation (MUST) project, a new learner translation corpus (LTC) initiative launched by the Centre for English Corpus Linguistics at the UCLouvain1 . The MUST project brings together translation and foreign language researchers and teachers around two main objectives: to collect and share translations produced by students and to process them using a standardized set of tools and guidelines with a view to advancing empirical research and optimizing translation teaching. Compared to other LTCs, which are generally limited to one language pair and/or translation direction and cover a rather restricted set of source texts (see e.g. Kübler 2008, Espunya 2014, Štěpánková 2014, Kutuzov and Kunilovskaya 2014, Wurm 2016), MUST is truly multilingual, represents a wide range of text types, genres and topics and includes both L2>L1 and L1>L2 translations. The project currently includes 31 research teams from 14 countries and covers 15 languages (Chinese, French, Dutch, English, Galician, German, Greek, Italian, Lithuanian, Macedonian, Norwegian, Polish, (Brazilian) Portuguese, Slovene, Spanish). To ensure easy access for all partners in the project, all the data are collected and searchable on a web-based interface, Hypal4MUST, an adapted version of the Hypal software tool designed by Obrusnik (2014) for the processing of parallel texts. Within the MUST project, TAS has been designed to help researchers annotate student translations on the Hypal4MUST interface, using a language-independent, standardized annotation system. The main inspiration for TAS was the CELTraC taxonomy (Javora 2015, Fictumova et al. 2017), itself based on the MeLLANGE translation error typology (Kübler 2008)2, and the ICLE and FRIDA error taxonomies designed within the framework of the International Corpus of Learner English (ICLE) and the French Interlanguage Database (FRIDA) projects (Dagneaux et al. 1998, Granger 2003). TAS presents two distinctive characteristics, which set it apart from the translation error typologies used in other LTCs: first, it offers the option of marking translation procedures (such as generalization or explicitation), thereby going beyond the analysis of translation errors and catering for theoretically oriented research; second, it offers the possibility of highlighting both erroneous and correct use. In view of these two features, it was decided to refer to the annotation system as “translation-oriented annotation” rather than “error annotation”. TAS (version 1.0) is made up of the following three parts: (1) ST-TT transfer: discrepancies between the source text (ST) and the target text (TT) and/or between the TT and the translation brief, (2) Language: features of the TT that are erroneous and/or inappropriate independently of the ST, (3) Translation procedures: procedures used to solve translation problems, which can be observed when comparing the translation with its source text. Each part of TAS is multi-layer. For example, ST-TT transfer is made up of the following categories and subcategories: Content transfer (Omission, Addition, Distortion, Indecision), Lexis (Translated untranslatable, Untranslated translatable, Term translated by non-term, Non-term translated by term), Discourse and pragmatics (Connectors, Theme-rheme), Register and culture (Register mismatch, Cultural mismatch), Translation brief (Inconsistent with glossary, Formatting). Besides the categories and 1 2

https://uclouvain.be/en/research-institutes/ilc/cecl/must.html http://corpus.leeds.ac.uk/mellange/ltc.html#LTC_annotation

subcategories, there are two meta-tags which the annotator can add to any (sub)category of TAS: (1) a plus sign to mark positive features, (2) a tag to mark suspected SL intrusion. As it is hierarchical, the system allows analysts to tag at the level of granularity that best fits their research or teaching aims. They may opt for broad categories (in Language: Grammar, Lexis and terminology, Cohesion, Mechanics, Style and situational context) or for more detailed annotation (e.g. distinguishing between Spelling, Punctuation and Units/dates/numbers in the Mechanics category). In addition, TAS features correction boxes and comment boxes, in which the annotator can add free comments on tagged words or phrases and/or on the whole translation. In our presentation, we will give a general overview of TAS, focusing on some of the challenges we faced when designing the first version of the system (e.g. cross-linguistic applicability of the categories, balance between added value for research and real-life use in foreign language and translation teaching). Second, we will critically assess the first two parts of the system (ST-TT transfer and Language) with a case study based on English-to-French student translations in the field of economics and finance.

References Dagneaux, E., Denness, S. and Granger, S. 1998. “Computer-aided Error Analysis”. System: An International Journal of Educational Technology and Applied Linguistics 26(2), 163-174. Espunya, A. 2014. “The UPF learner translation corpus as a resource for translator training”. Language Resources and Evaluation 48(1), 33-43. Fictumová, J., Obrusnik, A. and Štěpánková, K. 2017. “Teaching Specialized Translation: Error-tagged Translation Learner Corpora”. Sendebar 28, 209-241. Granger, S. 2003. “Error-tagged learner corpora and CALL: a promising synergy”. CALICO 20(3): 465-480. Jarova, S. 2015. Defining an Error Typology: the Case of CELTraC. Bachelor’s Diploma Thesis. Masaryk University. Kübler, N. 2008. “A comparable Learner Translator Corpus: Creation and use”. LREC 2008 Workshop on Comparable Corpora, 73-78. Kutuzov, A. and Kunilovskaya, M. 2014. “Russian Learner Translator Corpus: Design, Research Potential and Applications”. In P. Sojka, A. Horák, I. Kopeček and K. Pala (eds.) Text, Speech and Dialogue. Lecture Notes in Computer Science. Springer, 315-323. Obrusnik, A. 2014. “Hypal: A User-Friendly Tool for Automatic Parallel Text Alignment and Error Tagging”. Eleventh International Conference Teaching and Language Corpora, Lancaster, 20-23 July 2014, 67-69. Štěpánková, K. 2014. Learner Translation Corpus: CELTraC (Czech-English Learner Translation Corpus). Bachelor’s Diploma Thesis. Masaryk University. Wurm, A. 2016. “Presentation of the KOPTE Corpus and Research https://www.academia.edu/24012369/Presentation_of_the_KOPTE_Corpus_and_Research_Project

Project”.

Suggest Documents