Terminology management tools

64 downloads 7832 Views 718KB Size Report
mailing lists (MT-list, LinguistList, etc.) • thematic ... Most used free online MT software. ✓ 41% Google Translate. ✓ 10% SDL Free Translation. ✓ 10% Yahoo ...
Calling Professionals: Help Us to Understand Your Needs!

Helena Blancafort

Tatiana Gornostay

Questionnaire-based online survey

TTC

• Online survey with 41 questions about practices in the translation and localization industry • Questions about  Use of Translation Software  Terminology management practices and tools  Use of Corpora  Use of NLP tools Successful call for participants with 139 respondents! 2

Call for participants

TTC

Call for participants via • mailing lists (MT-list, LinguistList, etc.) • thematic groups on professional networks (LinkedIn Localization Professional, Translation Management Systems, TradOnline groups and others) • social networks (Twitter) • professional forums (Translators café) • Tilde Localization departments • freelance translators (personal contacts) 3

Participants’ profile • • • •

TTC

139 respondents 28 working languages (mainly from/into EN) 31 countries Occupation  In-house translators  Terminologists  Language teachers and translator trainers  Localization experts  Freelance translators (30%) 4

Participants’ profile Field

TTC

Number of respondents

Technical translation Software localization Legal translation Other (medical, tourism) Mass media translation Literary translation Technical writing Game localization

93 58 57 46 29 23 15 12 5

Use of translation software  74% use translation software  CAT tools • • • •

17% Trados 12% Similis 10% Transit 6-7% Logoport and Google Translation Kit • 5% Wordfast

TTC

24 CAT-Tools MT 9

both 67

6

Use of MT software

TTC

Most used free online MT software

24% Free online 52% 24%

Commercial both

    

41% Google Translate 10% SDL Free Translation 10% Yahoo Babel Fish 10% Systran Free 2 persons mentioned Moses

Most used commercial MT software     

17% Systran 9% Language Weaver 7% Reverso 6% Promt 6% Tilde 7

Reasons for not using MT

TTC

Principal hurdles : price, translation quality, not suitable for specific domain the person is working in Others : “Not necessary, wastes time” “people in my industry, which is legal, don't trust it and don't understand word sense disambiguation” 8

Terminology management

TTC

56% spend 10-30 % of their time working with terminology Different activities terminology research and collection at the top (40%) 13.4% monolingual 60.5% bilingual 24.4% multilingual terminology management 9

Multilingual terminology management

TTC

Terminology research (21,7%)

7.2

7

2

Terminology collection (17,1%) 21.7

Editing terminology in texts (13,8%)

8.6 17.1

12.9

Terminology creation (12,9%)

13.8 Terminology (8,6%) validation/standardization Terminology exchange (import/export functions) (7,2%) 10

Terminology management tools

Most used terminology management tools 13% SDL Termbase 13% Multiterm 7% Integrated TM tools in CAT-system

11

Terminology storage formats

Word document (34,9%)

0.9 9.6

TTC

6.6 34.9

Excel sheet (34,9%)

13.1 Other (please specify) (13,1%)

34.9

TBX (9,6%)

TMF (6,6%) LMF (0,9%) 12

Linguistic resources usually used for terminology research

TTC

Internal resources: dictionaries, glossaries, databases

1.4 0.5 4.3

Client resources 33.2

27.6

Other

Online resources: search engines, dictionaries, termbanks. I do not research terminology

13

Term related information usually researched

TTC

Lexical: translation equivalents, definition in a source language, etc. (22,4%) Contextual: example sentence, etc. (19,2%)

0.7 10.6

5.5

Usage: style, usage note, frequency, etc. (15,3%)

22.4

Categorical: subject fields, domains, products, etc. (13,2%)

13.1

19.2 13.2 15.3

Term relations: synonym, antonym, acronym, related terms, etc. (13,1%) Grammatical: part of speech, inflection, etc. (10,6%) Administrative: status, date, author, source, etc. (5,5%) Other (please specify) (0,7%) 14

TTC

Respondents’ needs and wishes about an online terminology database

15

Important fields for an online terminology database 2.6 0.6 10

6.2

21.8

10.3 19.7

10.6

18.2

Translation equivalents in different languages (21,8%) Definitions in different languages (19,7%) Contextual: example sentence, etc. (18,2%) Usage: style, usage note, frequency, etc. (10,6%) Categorical: subject fields, domains, products, etc. (10,3%) Term relations: synonym, antonym, acronym, related terms, etc. (10%) Grammatical: part of speech, inflection, etc. (6,2%) Administrative: status, modification date, author, source, etc. (2,6%) Other (please specify) (0,6%) 16

Important features and aspects for an online terminology database Lookup speed 1.4 8.8

5.6

Number of lookups returned As precise as possible

25

Good coverage of terms across many languages and domains

15.1 22.2 21.8

Hyperlinks (terms can be reverse-looked up by a simple click) Feature of saving my terms / saving search history Nr. of lookups returned - The more the better 17

Filtering techniques to narrow the search

TTC

by specific subject area/category/domain

9.4

8.4

by one source and one target language

24.1

by several subject areas/categories/domains

10.4 12.4

21.1 14.4

by one source language and several target languages by selecting only some out of over 100 databases to be used for lookup by several languages of my use

18

Do you own a terminology database TTC that you would like to share with other users?

• 55 respondents will agree to share their terminology database • 53 respondents do not want to share their terminology database • 17 respondents do not have a terminology database or have a non-organized terminology database • 5 respondents cannot share their terminology database because their clients are the owner 19

TTC

Use of corpora and NLP Tools

20

Use of corpora

TTC

• 50% collect corpora of the relevant domain • 39% parallel corpora, 18% comparable corpora, 44% both • Most current strategy (39)  Quick reading and highlighting of terms and manually research of an equivalent • Only 7 respondents use automatic processing • Hurdle: time consuming task

21

Use of online corpora concordance tools

TTC

• Only 30% respondents use concordance tools • 28 respondents did not know these tools

22

Use of NLP tools to process the collected corpora

TTC

• Only 10% use NLP tools, mainly POS taggers • 14 respondents think they are not available to their knowledge or do not know how to use it and have no time to learn. • 9 respondents did not know what NLP tools are. • 5 respondents think they are not useful or they do not cover their needs.

23

Users’ wishes for collection of corpora

TTC

• Search functions  Concordancer (also POS tags), show context of a term  Rich metadata search • Automatic updating (crawl)  RSS feeds to be validated manually  Automatic categorization to defined domains  Automatic saving of the link • Frequency lists • Annotation function • Collaborative tool: share with others • Formats: sentence per line plaintext, TMX format • Terminology extraction tools 24

TTC

We thank all the participants for your time and contribution!

25