Terminology standards – enhancing language. ISO/TC 37 → Semantic
Interoperability. ISO TC 37 Secretariat c/o Infoterm. Christian Galinski. Bamako (
Mali) ...
Terminology standards – enhancing language ISO/TC 37 Î Semantic Interoperability
ISO TC 37 Secretariat c/o Infoterm
Christian Galinski
Bamako (Mali) 2005-05-06/07
Overview
UNESCO’s IFAP Area 4 IFAP UNESCO and multilinguality Advocating open access solutions Language in industry eContent development Global semantic interoperability Standards for ... Terminology standardization Terminology? Æ Content entities Terminology ÅÆ eContent Terminology in ISO/TC 37 + Language resources & LR management + Content resources Standardization of terminological principles and methods ISO/TC 37 ISO/TC 37/SC 1 ~ 4 ISO/TC 37 Outlook Semantic interoperability – HOW?
ISO/TC 37 – Bamako 2005-06/07
What is terminology?
The description of the specialized vocabulary of an application domain Cf. Eugen Wüster: conceptual view Îknowledge representation at concept level Monolingual or multilingual Mainly nouns (in cl. multi-words nominal units), some verbs, adjectives and adverbs A strong yet practical simplification of lexical description Increasing occurrence of non-verbal knowledge representations
ISO/TC 37 – Bamako 2005-06/07
ISO/TC 37 – Bamako 2005-06/07
IFAP Areas of intervention What are IFAP’s areas of intervention? • Area 1: Development of international, regional and national information policies • Area 2: Development of human resources and capabilities for the information age • Area 3: Strengthening institutions as gateways for information access
•
Area 4: Development of information processing and management tools and systems (Multilingualism) Îstandards
ÎISO/TC 37 methodology standards: • terminology • language resources (at the level of concepts) • other content entities (at the level of concepts) ISO/TC 37 – Bamako 2005-06/07
UNESCO and multilinguality
Promoting a wider, more equitable access to information (« Recommendation on the promotion of multilingualism and universal access to Cyberspace »/ Initiative B@bel)
Raising awareness of issues of equitable access and multilingualism
Encouraging Member States to
Develop strong policies which promote and facilitate language diversity on the Internet ÎGuidelines for Terminology Policies
Create widely-available online tools and applications (such as terminologies, automatic translators, dictionaries) for content in local languages
Share of best practices and information Î ISO/TC 37
ISO/TC 37 – Bamako 2005-06/07
Advocating open access solutions
“Member
States and international organizations should encourage open access solutions including the formulation of technical and methodological standards for information exchange, portability and interoperability, as well as online accessibility of public domain information on global information networks.” (UNESCO Recommendation on Multilingualism and Access to Cyberspace)
“Governments
should promote the development and use of open, interoperable, non-discriminatory and demand-driven standards.” (WSIS Action Plan)
Î Open source software? + Open content? ISO/TC 37 – Bamako 2005-06/07
Language in industry Exchange of content entities: e.g. entry in a product catalogue Name of company (® enterprise) 225/55/16 Name of product (model) (™ enterprise) Generic name of product (e.g. © Harmonized System) Class (name under which the product falls) (e.g. © eCl@ss) Verbal/textual description (© enterprise) Picture (© rights owner) Technical data • • • •
(unified) branch properties (e.g. © OAGi) Standardized characteristics (e.g. © DIN) Enterprise product specific data (e.g. for collaborative business) Enterprise internal data (maybe confidential/secret)
ISO/TC 37 – Bamako 2005-06/07
V
eContent DEVELOPMENT Workflow management for content development: Î net-based, distributed, cooperative creation of structured content
¾ multilingual
Re-use in applications:
¾ multimodal
(based on the “single-source” principle)
¾ multimedia
• eLearning
complying with ¾ multi-channel output ¾ accessibility requirements
• eGovernment • eHealth • eBusiness • other e...s
ÎCO-OPERATION Î INTEROPERABILITY ÎSTANDARDIZATION ISO/TC 37 – Bamako 2005-06/07
THE CHALLENGE: (user point-of-view) • throughout the enterprise/organization • between enterprises/organizations • within industry consortia • between industry consortia • between different e…s • between different language communities
Æ requested e.g. in e-government Æ requested by the market Æ requested by industry branches Æ ??? (urgently needs harmonization and especially open standards) Æ requested by the user Æ requested by the end user
Î within the standardization world
Æ Global Semantic Interoperability
ISO/TC 37 – Bamako 2005-06/07
STANDARDS FOR: hw Î sw Î methodology standards
Technology Æ ITU, ISO, IEC, industry Business models Æ UN/ECE, ISO, industry “Language” Æ ISO/TC 37, research consortia Transfers/transactions Æ ITU, UN/ECE, industry Standards* Æ MoU/MG – why? Content Æ ? Methodology!!! Îsemantic interoperability Legal issues ?
Æ *standards should be examined, whether they support, allow or hinder multilinguality and cultural diversity (very important for SMEs) and semantic interoperability at large ISO/TC 37 – Bamako 2005-06/07
Terminology standardization
Standardization of terminologies • Terminological data • Linguistic and non-linguistic representations • Designations: term, abbreviation, graphic symbol, formula, acoustic symbol, etc. • Descriptions: definition, explanation, non-linguistic [descriptive] representation, etc. • Source-related data • Data management related data (field, record, holding) • Classification (multiple)
Terminology-related data: names, phraseology, ... Standardization of terminological principles and methods •
Î generic for many types of content entities ISO/TC 37 – Bamako 2005-06/07
Terminology? Î content entities
Terminology? Î knowledge representations • • • • • • •
Nomenclature, taxonomy, typology, partonomy, ... Glossary, vocabulary, ... Terminological phraseology Graphical symbols and other non-linguistic representations? Properties, characteristics, attributes, ... Ontology Names? Æto be further studied
+ closely related:
Thesauri, classification schemes, keywords Encyclopedic (knowledge) entries • •
Knowledge-enriched terminology entries Names, proper names, ...
Ontologies, topic maps, ...
Î ONE methodology ISO/TC 37 – Bamako 2005-06/07
Terminology ÅÆ eContent
embedded terminology (or combination of terminology + …) • • • •
Texts: Æ translation, localization, internationalization… Speech: Æ communication… Image: Æ CAD/CAM… Multimedia: Æ video, presentations…
knowledge-rich terminology • •
Encyclopedic knowledge: Wikipedia… “Knowledge” management: Æ incl. true “content management” • document management, • communication management, • information management
“popularized” terminology
Î
“Terminology and other language and content resources”
Î
ONE methodology
ISO/TC 37 – Bamako 2005-06/07
Terminology today Given its pervasive occurrence in all (written or spoken) domain communication, terminology today has to be considered an economic factor especially in product data description and management (incl. eCatalogues and product classification) quality management inter-cultural aspects of management and marketing translation and localization information, documentation, software development knowledge transfer, teaching and training, … Î Multilinguality and cultural diversity
Î terminology science as a field of fundamental research as well as applied R&D Î impact on standardization
ISO/TC 37 – Bamako 2005-06/07
Terminology in ISO/TC 37 Multifunctional nature of terminology:
Terminology as knowledge representation Terminologies as means of domain communication Terminologies as means of access to other kinds of information (objects) Terminologies as means of knowledge ordering at micro-level
ISO/TC 37 – Bamako 2005-06/07
+ Language resource management
Language resources: • Text corpora Æ tagging (on the basis of grammar models) • Lexicographical data • Words • Collocations • Morphology • Terminology • Speech data LR management: • Input / import • Metadata (incl. bundling/bindings etc.) • Data modelling & metamodel(s) • Exchange / interoperability • etc.
ISO/TC 37 – Bamako 2005-06/07
+ other kinds of content entities Textual & non-linguistic types of content: Audio information (e.g. read-out written content) av information (e.g. sign language) Multimedia information Haptic information (e.g. in “intelligent cars”) …
Increasingly different (technical) types of content co-occur or are embedded in each other or are combined with each other – e.g. traffic telematics
ISO/TC 37 – Bamako 2005-06/07
ISO/TC 37 – Standardization of terminological principles and methods
Fundamental principles Vocabulary of terminology Terminography Language resource management Terminology work (especially systematic ~~) Applications based on terminology methods Content management? Î eContent Æ mContent • Multilingual, multimodal, multimedia, universal accessibility, multi-channel • Re-usability Æ interoperability/ies • Resource-sharing Æ peer2peer
ISO/TC 37 – Bamako 2005-06/07
ISO/TC 37
Old title: Terminology and other language resources Old scope: Standardization of principles, methods and applications relating to terminology and other language resources
New title: Terminology and language and content resources New scope: Standardization of principles, methods and applications relating to terminology and other language and content resources in the contexts of multilingual communication and cultural diversity
As is the case with terminologies, language resources in general have to be considered as multilingual, multimedia and multimodal from the outset. Æ Generic fundamental standards for all activities involving language ISO/TC 37 – Bamako 2005-06/07
ISO/TC 37/SC 1 (1)
Title: Principles and methods Old scope: Standardization of basic principles and methods for developing scientific and technical terminologies and other language resources New scope: ??? still under discussion ISO/TC 37/SC 1 prepares the meta-standards for the documents prepared by ISO/TC 37/SCs 2, 3 and 4, which cannot be consistent and coherent without these standards. The same applies to the documentation of content management in organizations.
ISO/TC 37 – Bamako 2005-06/07
ISO/TC 37/SC 1 (2) The following standards are under the direct responsibility of ISO/TC 37/SC 1: ISO 704:2000 Terminology work – Principles and methods ISO 860:1996 Terminology work – Harmonization of concepts and terms ISO 1087-1:2000 Terminology work – Vocabulary – Part 1: Theory and application The following standards are under preparation: ISO/CD 704 Terminology work – Principles and methods ISO/CD 860 Terminology work – Harmonization of concepts and terms ISO/PWI 1087-1 Terminology work – Vocabulary – Part 1: Theory and application ISO/WD 22134 Practical guide for socioterminology ISO/TC 37 – Bamako 2005-06/07
ISO/TC 37/SC 2 (1)
Title: Terminography and lexicography New scope: Standardization of terminological and lexicographical working methods, procedures, coding systems, workflows, and cultural diversity management, as well as related certification schemes Tens of thousands of terminology commissions, committees and other terminological entities (especially terminology standardizing SCs and WGs within the standardization framework) are using ISO/TC 37/SC 2 standards. This indirectly improves the overall degree of re-usability and interoperability of the resulting data and documents.
ISO/TC 37 – Bamako 2005-06/07
ISO/TC 37/SC 2 (2) The following standards are under the direct responsibility of ISO/TC 37/SC 2:
ISO 639-1:2002 Codes for the representation of names of languages – Part 1: Alpha-2 code ISO 639-2:1998 Codes for the representation of names of languages – Part 2: Alpha-3 code ISO 1951:1997 Lexicographical symbols and typographical conventions for use in terminography ISO 10241:1992 International terminology standards -- Preparation and layout ISO 12199:2000 Alphabetical ordering of multilingual terminological and lexicographical data represented in the Latin alphabet ISO 12616:2002 Translation-oriented terminography ISO 15188:2001 Project management guidelines for terminology standardization
ISO/TC 37 – Bamako 2005-06/07
ISO/TC 37/SC 2 (3) The following standards are under preparation:
ISO/CD 639-3 Codes for the representation of names of languages – Part 3: Alpha-3 code for comprehensive coverage of languages ISO/WD 639-4 Codes for the representation of names of languages – Part 4: Implementation guidelines and general principles for language coding ISO/WD 639-5 Codes for the representation of names of languages – Part 5: Alpha-3 code for language families and groups ISO/CD 639-6 Codes for the representation of names of languages – Part 6: Extension coding for language variation ISO/DIS 1951 Presentation/representation of entries in dictionaries ISO/CD 10241-1 Terminological entries in standards – Part 1: General requirements ISO/AWI 10241-2 Terminological entries in standards ISO 12615 Bibliographic references and source identifiers for terminology ISO/PWI TR 22128 Quality assurance guidelines for terminology products ISO/PWI 22130 Additional language coding ISO/NP 23185 Assessment and benchmarking of terminological holdings
ISO/TC 37 – Bamako 2005-06/07
ISO/TC 37/SC 3 (1)
Old title: Computer applications for terminology
New title: Terminology management systems and content interoperability New scope: Standardization of principles and requirements for semantic interoperability, terminology and content management systems, and knowledge ordering tools
Software developers are taking the documents of ISO/TC 37/SC 3 for designing terminology management systems (TMS) or terminology management modules to be integrated into content management as well as information and knowledge management systems. In this way the terminological principles and methods (provided by ISO/TC 37/SC 1) are directly integrated as ‘defaults’ into concrete system design for handling all kinds of information. ISO/TC 37 – Bamako 2005-06/07
ISO/TC 37/SC 3 (2) The following standards are under the direct responsibility of ISO/TC 37/SC 3: ISO 1087-2:2000 Terminology work – Vocabulary – Part 2: Computer applications
ISO 6156:1987 Magnetic tape exchange format for terminological/ lexicographical records (MATER) - withdrawn
ISO 12200:1999 Computer applications in terminology – Machine-readable terminology interchange format (MARTIF) – Negotiated interchange ISO 12620:1999 Computer applications in terminology – Data categories ISO 16642:2003 Computer applications in terminology – Terminological markup framework
ISO/TC 37 – Bamako 2005-06/07
ISO/TC 37/SC 3 (3) The following standards are under preparation: ISO/PWI TR 12618 Computational aids in terminology – Design, implementation and use of terminology management systems ISO/CD 12620-1 Computer applications in terminology – Data categories – Part 1: Model for description and procedures for maintenance of data category registries for language resources ISO/CD 12620-2 Computer applications in terminology – Data categories – Part 2: Terminological data categories ISO/TC 37 – Bamako 2005-06/07
ISO/TC 37/SC 4 (1)
Title: Language resource management Scope: Standardization of specifications for computerassisted language resource management
Given the fact that • linguistic infrastructures are being established or re-enforced as part of the rapidly evolving information and communication society; • professional activities involving language resource sharing and standardization are increasing in diverse areas: governmental or non-governmental organizations, public or private institutions, educational institutions, commercial enterprises, etc., both, globalization and localization necessitate multilingual communication; there is an increasing need for new standardization as well as urgent recognition of existing de facto standards and their transformation into International Standards.
ISO/TC 37 – Bamako 2005-06/07
ISO/TC 37/SC 4 (2) The following standards are under preparation: ISO/AWI 21829 Terminology for language resources ISO/CD 24610-1 Language resource management – Feature structures – Part 1: Feature structure representation ISO/WD 24611 Language resource management – Morphosyntactic annotation framework ISO/WD 24612 Language Resource Management – Linguistic Annotation Framework ISO/WD 24613 Language resource management – Lexical markup framework ISO/AWI 24614-1 Word segmentation of written texts for mono-lingual and multi-lingual information processing – Part 1: General principles and methods ISO/AWI 24614-2 Word segmentation of written texts for mono-lingual and multi-lingual information processing – Part 2: Word segmentation for Chinese, Japanese and Korean ISO/NP 24614-3 Word segmentation of written texts for monolingual and multi-lingual information processing – Part 3: Word segmentation for other languages ISO/TC 37 – Bamako 2005-06/07
State-of-the-art METHODOLOGY ISO 16642*
APPLICATIONS (family of) metamodels*
Datamodels ISO 12200**
Datamodels** eBusiness
Datamodels other e...s**
Data categories ISO 12620***
Domain data dictionaries***
DDDs DDDs DDDs DDDs *** *** *** ***
Å
Datamodels other e...s**
Basic principles and requirements concerning multilingual e/m-content development, data categories/metadata, data modelling, rules for repositories (maintained in MAs/RAs/Reg’s)
*ISO 16642 TMF; ISO 10303-11 EXPRESS; ISO 10303-21 SDAI; … **ISO 12200 MARTIF; ISO 13584-42 PLIB ~ IEC 61360-2 ***ISO 12620 Data categories; ISO 13584-511 Fastener dictionary; IEC 61360-4 Core dictionary; …
ISO/TC 37 – Bamako 2005-06/07
Æ
Semantic interoperability standards
Content-related requirements Workflow methodology Metadata Metadata repositories Data modelling principles and requirements Micro data models Metamodels Content repositories Federation of repositories …
ISO/TC 37 – Bamako 2005-06/07
CONFERENCES
Terminology Summer School - Cologne (Germany) 2005-07-14/23 TAMA 2005 “Terminology in Advanced Management Applications” – Wiesbaden (Germany) 2005-11-09 TKE 2005 “Terminology and Knowledge Engineering” – Copenhagen (Denmark) 2005-08-15/19 OFMR 2006 “Open Forum on Metadata Registries” – Japan 2006-03-20/22
ISO/TC 37 – Bamako 2005-06/07
Thank you for your attention ISO/TC 37 Secretariat: Secretary: Christian Galinski Chairman: Håvard Hjulstad (SN)
ISO/TC 37 c/o Infoterm – International Information Centre for Terminology
ADDRESS:
Aichholzgasse 6/12 A-1120 Vienna – Austria Tel: +43-1-817 44 99 Fax:+43-1-817 44 99-44
[email protected] http://www.infoterm.info