mailing lists (MT-list, LinguistList, etc.) ⢠thematic ... Most used free online MT software. â 41% Google Translate. â 10% SDL Free Translation. â 10% Yahoo ...
Calling Professionals: Help Us to Understand Your Needs!
Helena Blancafort
Tatiana Gornostay
Questionnaire-based online survey
TTC
• Online survey with 41 questions about practices in the translation and localization industry • Questions about Use of Translation Software Terminology management practices and tools Use of Corpora Use of NLP tools Successful call for participants with 139 respondents! 2
Call for participants
TTC
Call for participants via • mailing lists (MT-list, LinguistList, etc.) • thematic groups on professional networks (LinkedIn Localization Professional, Translation Management Systems, TradOnline groups and others) • social networks (Twitter) • professional forums (Translators café) • Tilde Localization departments • freelance translators (personal contacts) 3
Participants’ profile • • • •
TTC
139 respondents 28 working languages (mainly from/into EN) 31 countries Occupation In-house translators Terminologists Language teachers and translator trainers Localization experts Freelance translators (30%) 4
Participants’ profile Field
TTC
Number of respondents
Technical translation Software localization Legal translation Other (medical, tourism) Mass media translation Literary translation Technical writing Game localization
93 58 57 46 29 23 15 12 5
Use of translation software 74% use translation software CAT tools • • • •
17% Trados 12% Similis 10% Transit 6-7% Logoport and Google Translation Kit • 5% Wordfast
TTC
24 CAT-Tools MT 9
both 67
6
Use of MT software
TTC
Most used free online MT software
24% Free online 52% 24%
Commercial both
41% Google Translate 10% SDL Free Translation 10% Yahoo Babel Fish 10% Systran Free 2 persons mentioned Moses
Most used commercial MT software
17% Systran 9% Language Weaver 7% Reverso 6% Promt 6% Tilde 7
Reasons for not using MT
TTC
Principal hurdles : price, translation quality, not suitable for specific domain the person is working in Others : “Not necessary, wastes time” “people in my industry, which is legal, don't trust it and don't understand word sense disambiguation” 8
Terminology management
TTC
56% spend 10-30 % of their time working with terminology Different activities terminology research and collection at the top (40%) 13.4% monolingual 60.5% bilingual 24.4% multilingual terminology management 9
Multilingual terminology management
TTC
Terminology research (21,7%)
7.2
7
2
Terminology collection (17,1%) 21.7
Editing terminology in texts (13,8%)
8.6 17.1
12.9
Terminology creation (12,9%)
13.8 Terminology (8,6%) validation/standardization Terminology exchange (import/export functions) (7,2%) 10
Terminology management tools
Most used terminology management tools 13% SDL Termbase 13% Multiterm 7% Integrated TM tools in CAT-system
11
Terminology storage formats
Word document (34,9%)
0.9 9.6
TTC
6.6 34.9
Excel sheet (34,9%)
13.1 Other (please specify) (13,1%)
34.9
TBX (9,6%)
TMF (6,6%) LMF (0,9%) 12
Linguistic resources usually used for terminology research
TTC
Internal resources: dictionaries, glossaries, databases
1.4 0.5 4.3
Client resources 33.2
27.6
Other
Online resources: search engines, dictionaries, termbanks. I do not research terminology
13
Term related information usually researched
TTC
Lexical: translation equivalents, definition in a source language, etc. (22,4%) Contextual: example sentence, etc. (19,2%)
0.7 10.6
5.5
Usage: style, usage note, frequency, etc. (15,3%)
22.4
Categorical: subject fields, domains, products, etc. (13,2%)
13.1
19.2 13.2 15.3
Term relations: synonym, antonym, acronym, related terms, etc. (13,1%) Grammatical: part of speech, inflection, etc. (10,6%) Administrative: status, date, author, source, etc. (5,5%) Other (please specify) (0,7%) 14
TTC
Respondents’ needs and wishes about an online terminology database
15
Important fields for an online terminology database 2.6 0.6 10
6.2
21.8
10.3 19.7
10.6
18.2
Translation equivalents in different languages (21,8%) Definitions in different languages (19,7%) Contextual: example sentence, etc. (18,2%) Usage: style, usage note, frequency, etc. (10,6%) Categorical: subject fields, domains, products, etc. (10,3%) Term relations: synonym, antonym, acronym, related terms, etc. (10%) Grammatical: part of speech, inflection, etc. (6,2%) Administrative: status, modification date, author, source, etc. (2,6%) Other (please specify) (0,6%) 16
Important features and aspects for an online terminology database Lookup speed 1.4 8.8
5.6
Number of lookups returned As precise as possible
25
Good coverage of terms across many languages and domains
15.1 22.2 21.8
Hyperlinks (terms can be reverse-looked up by a simple click) Feature of saving my terms / saving search history Nr. of lookups returned - The more the better 17
Filtering techniques to narrow the search
TTC
by specific subject area/category/domain
9.4
8.4
by one source and one target language
24.1
by several subject areas/categories/domains
10.4 12.4
21.1 14.4
by one source language and several target languages by selecting only some out of over 100 databases to be used for lookup by several languages of my use
18
Do you own a terminology database TTC that you would like to share with other users?
• 55 respondents will agree to share their terminology database • 53 respondents do not want to share their terminology database • 17 respondents do not have a terminology database or have a non-organized terminology database • 5 respondents cannot share their terminology database because their clients are the owner 19
TTC
Use of corpora and NLP Tools
20
Use of corpora
TTC
• 50% collect corpora of the relevant domain • 39% parallel corpora, 18% comparable corpora, 44% both • Most current strategy (39) Quick reading and highlighting of terms and manually research of an equivalent • Only 7 respondents use automatic processing • Hurdle: time consuming task
21
Use of online corpora concordance tools
TTC
• Only 30% respondents use concordance tools • 28 respondents did not know these tools
22
Use of NLP tools to process the collected corpora
TTC
• Only 10% use NLP tools, mainly POS taggers • 14 respondents think they are not available to their knowledge or do not know how to use it and have no time to learn. • 9 respondents did not know what NLP tools are. • 5 respondents think they are not useful or they do not cover their needs.
23
Users’ wishes for collection of corpora
TTC
• Search functions Concordancer (also POS tags), show context of a term Rich metadata search • Automatic updating (crawl) RSS feeds to be validated manually Automatic categorization to defined domains Automatic saving of the link • Frequency lists • Annotation function • Collaborative tool: share with others • Formats: sentence per line plaintext, TMX format • Terminology extraction tools 24
TTC
We thank all the participants for your time and contribution!
25