Converting a Monolingual Lexical Database into a

1 downloads 0 Views 418KB Size Report
2) Named entities: 윈도우 'Windows', 리눅스 'Linux', 텔넷 'Telnet', 솔라리스 .... two different articles; one corresponds to the French sauvegarder 1 'save'; and the ... phone', the adjectival constituent cellular is realized as a Korean noun with a.
Converting a Monolingual Lexical Database into a Multilingual Specialized Dictionary Hee Sook Bae and Marie-Claude L’Homme Observatoire de linguistique Sens-Texte Département de linguistique et de traduction Université de Montréal C.P. 6128, succursale Centre-ville Montréal QC H3C 3J7 {hee.sook.bae & mc.lhomme}@umontreal.ca

Abstract The general objective of this work is to convert a pre-existing monolingual lexical database into a multilingual specialized dictionary. To this end, we developed a module for Korean based on a model and a methodology primarily designed for French in the domains of computing and the Internet. We will describe the different stages of our methodology: corpus processing, term extraction, term selection, sense distinction, establishment of equivalents, description of the actantial structure of each term sense and listing of semantically related terms. In this article, we first introduce the structure of the French database and present that of the Korean module with several examples. Then, we discuss some adaptations of the methodology to another language and specific linguistic phenomena observed during the process. Finally, we classify and discuss the problems related to the assignment of equivalents. Keywords: multilingual specialized dictionary, lexical semantics, equivalents, actantial structure, semantically related terms

1

Introduction

The work presented in this article is part of a larger project dealing with the conversion of a pre-existing monolingual database into a multilingual dictionary. Our general objective is to expand a monolingual database containing French terms related to computing and the Internet to a multilingual dictionary in which several language modules are connected through a common model. In this article, we deal more specifically with the construction and the adaptation required for the Korean module. We present the different steps of this adaptation and discuss some linguistic phenomena and problems specific to

Korean that were observed while assigning equivalents. We believe that this preliminary work on Korean will contribute to the development of a more general model for different languages. First, the French database (DiCoInfo, Dictionnaire fondamental de l’informatique et de l’Internet) is briefly introduced (Section 2). The dictionary describes fundamental terms in the fields of computing and the Internet and relies heavily on data gathered from a 1 million word corpus. The dictionary is original in the sense that it focuses on the linguistic behavior of terms and provides detailed lexico-semantic information. The articles take into account the polysemy of terms, their actantial structures, and give long lists of semantically related terms along with a formal explanation of the relation. Then, the structure of the Korean module is compared to that of the French database (Section 3). Some decisions based on the specific behavior of Korean terms will be justified. Section 4 presents the methodological steps for developing the Korean articles. The Korean module is constructed using the methodology devised for French. However, it is built separately based on Korean material. In other words, French material is not translated into other languages. Korean corpora are used and analyzed by terminologists; all semantic distinctions are based on the behavior of Korean terms; the listing of semantically related terms relies on observations made in Korean. Equivalents are assigned once the descriptions for a given language are completed. Finally, we discuss the relations drawn between French and Korean terms (Section 5). During this process, we observed several linguistic phenomena specific to Korean and some problems which arise when a monolingual model is adapted to other languages.

2

The original dictionary

As mentioned in the introduction, this work is based on a French monolingual database called the DiCoInfo (Dictionnaire fondamental de l’informatique et de l’Internet). 1 The DiCoInfo is a specialized French dictionary providing the description of the linguistic properties of terms pertaining to the fields of computing and the Internet (L’Homme 2006; L’Homme and Bae 2006). It focuses on basic terms, i.e. terms that can be found in different texts (e.g., ordinateur ‘computer’, programme ‘program’, Web) as opposed to terms that are linked exclusively to a specialization of computer science. Currently, the DiCoInfo contains over 900 completed articles (each article corresponds to a specific sense) and more than 20,000 related terms listed in the articles (on average, 24 relationships are identified for each term). Approximately 900 articles are still under construction.

2.1

Basic principles of the DiCoInfo

The DiCoInfo, contrary to most terminological databases which are conceptbased, relies on lexical semantics. While there are several large repositories of terms dealing with their conceptual aspects, there appears to be a lack of specialized resources focusing on their linguistic properties.2 This project was undertaken in order to provide a framework for developing such resources. More specifically, the DiCoInfo is based on the lexicological component of the Meaning-Text Theory, that is Explanatory and Combinatorial Lexicology, ECL, (Mel’čuk et al. 1995, 1984-1999; Polguère 2003). ECL considers lexical units as distinct senses and provides complete formal descriptions of their semantic and syntactic properties.3 In addition, ECL provides a complete formal apparatus to account for a large number of semantic relationships between lexical units, namely lexical functions. Since it relies on a lexico-semantic framework, the DiCoInfo has a perspective on the notion of “term” which differs from that found in other terminological endeavors. Terms are viewed as lexical units the sense of which can be related to a specialized field of knowledge. (L’Homme 2005 lists a number of arguments to support this view against the background of other leading approaches to the notion of “term” in terminology.) This viewpoint on the term allows terminologists to select units which belong to different parts of speech. Hence, the DiCoInfo contains nouns (pourriel ‘spam’), verbs (configurer ‘configure’), adjectives (virtuel ‘virtual’), and adverbs (dynamiquement ‘dynamically’). A small number of phrases are also listed (sans fil ‘wireless’). Also, the DiCoInfo is chiefly interested in single-word terms: terms formed of a single base, such as souris ‘mouse’, écran ‘monitor’; derivatives, such as blogueur ‘blogger’, navigateur ‘browser’; or terms composed of parts of otherwise autonomous words, such as courriel ‘email’ (which is the result of a composition of courrier ‘mail’ and électronique ‘electronic’.). The DiCoInfo takes into account multi-word terms only if their meaning cannot be accounted for by adding the separate meanings of their constituent parts (traitement de texte ‘word processor’; boîte de dialogue ‘dialogue box’; glisser-déposer ‘drag and drop’). The articles of the DiCoInfo are devoted to the description of specific senses. Specialized senses are discovered and delimited by examining the interactions of lexical units with others. Below is an example of this kind of distinction for the lexical form courriel ‘email’. This example is used since the distinctions also apply to the English form email. (More details on how senses are distinguished are given in Section 4.2.5.)

courriel 1, n. m.: the specific entity which is sent to someone (e.g.. I sent an email to inform the staff of the changes) courriel 2, n. m.: the mass noun (e.g. I receive a lot of email every day). courriel 3, n. m.: the technology (e.g. This message was sent by email)

2.2. Structure of the articles Currently, the articles in the DiCoInfo contain three different fields which are briefly described in this section. A fourth field containing a definition is being developed and will be added in the near future. First, each separate meaning is accompanied by its actantial structure, which states the number and position of actants. We have reproduced below the actantial structures of the terms cyberespace ‘cyberspace’ and numériser ‘to scan’. cyberespace, n.m.: cyberespace utilisé par AGENT pour intervenir sur PATIENT ‘cyberspace used by AGENT to act on PATIENT’ numériser, v.tr.: AGENT numérise PATIENT avec INSTRUMENT ‘AGENT scans PATIENT with INSTRUMENT’

As can be seen in the examples above, the actants are described in terms of actantial roles. Approximately 10 labels are used (refer to L’Homme 2006 for the complete list and the definition of each label). The most frequently used labels in the DiCoInfo are , , , and . In addition to specifying the semantic role of actants, the dictionary also provides a list of linguistic realizations for each actantial position. These realizations correspond to terms which are also dealt with in the dictionary (i.e. appear as headwords of separate articles). Table 1 gives an illustration for the term Internet. Table 1 List of terms realized as actants Internet, n. m.: Internet utilisé par AGENT pour intervenir sur PATIENT ‘Internet used by AGENT to act on PATIENT’ Agent Patient internaute 1 ‘Web user’ blogue 1 ‘blog’ utilisateur 1 ‘user’ information 1 ‘information’ visiteur 1 ‘visitor’ page 2 ‘page’ portail 1 ‘portal’ ressource 2 ‘resource’ site 1 ‘site’

The articles also contain a list of contexts which are a selection of the concordances analyzed by terminologists to write the articles. Table 2 is an example for the term debogage ‘debugging’. Table 2 Contexts extracted from the corpus débogage, n. m. CONTEXT(S) Un système typique peut consister en différents aspects, incluant les règles métier, la performance, la persistance des données, la journalisation des activités, le débogage, l'authentification, la sécurité. Les contrats présentent des avantages certains dans le développement logiciel, qu'il s'agisse de l'analyse et des spécifications, de la documentation, des tests et du débogage, de la robustesse, ou de la réutilisation de composants. Ces messages peuvent être très utiles pour le débogage des règles de filtrage des noyaux et du masquerading.

SOURCE(S) ASPECT

CONTRAT

LINUX3P3

The most complex field provides a complete description of all the semantic relationships a term has with other terms in the domains of computing and the Internet. Both paradigmatic (e.g. synonymy, antonymy, hyperonymy and hyponymy, meronymy and holonymy) and syntagmatic (i.e. collocations) relationships are taken into account. These relationships are encoded by means of lexical functions, LFs (Mel’čuk et al. 1995). Lexical functions allow a rigorous representation of linguistic relations between lexical units.4 In addition to the encoding with LFs, an explanation which refers to the actantial roles is also provided. Table 3 is an example for the term cyberespace. Table 3 Semantically related terms cyberespace, n. m.: cyberespace utilisé par AGENT pour intervenir sur PATIENT Lexical Explanation of the relation Related term function Syn Synonym cyber-espace Qsyn Quasi-synonym Internet 1 Loc-in In a « key word » dans le ~ ‘in ~’ Prepar1Real1 The agent gets ready to use the « accéder 2 au ~ ‘to access ~’ key word » S0Prepar1Real1 -> NOUN accès 2a au ~ ‘access to ~’ Real1 The agent uses the « key word » naviguer 1 dans le ~ ‘browse ~’ S0Real1 -> NOUN navigation 1a dans le ~ ‘browsing of ~’

Labreal12 S0Labreal12 CausOper2 S0CausOper2

3

The agent uses the « key word » to act on the patient -> NOUN Someone or something puts the patient in the « key word » -> NOUN

chercher 1 le patient dans le ~ ‘search for something in ~’ recherche 1a du patient dans le ~ ‘search for something in ~’ diffuser 1 le patient dans le ~ ‘post something in ~’ diffusion 1a le patient dans le ~ ‘posting of something in ~’

The Korean module

Basically, the Korean module adheres to the theoretical principles on which the French counterpart relies. It considers terms as lexical units and contains head words which belong to different parts of speech. So far, 115 terms of the 956 manually selected have been described. For these terms, 168 different senses were identified.5 A list of 2,538 semantically related terms have already been compiled. The Korean DiCoInfo takes into account a large number of single-word terms: base terms, such as 파일 ‘file’, 검색 ‘search’, 자판 ‘keyboard’; and derivatives, such as 사용자 ‘user’, 관리자 ‘administrator’, 인증서 ‘authentication’, 명령어 ‘command’, 개발자 ‘developer’. In Korean, compound terms, such as 즐겨찾기 ‘bookmark’, 운영체제 ‘operation system’ and 스팸메일 ‘spam mail’, are also considered since their sense cannot be derived from the sense of each constituent. These combinations are defined as non-compositional sequences according to the following criteria: no word can be inserted between constituents and a constituent cannot be replaced by another. For example, 즐겨찾기 can be subdivided into two autonomous constituents: a nominalized verb 찾기 ‘search’ and an adverbial form 즐겨 ‘frequently’. However, no word can be inserted between 즐겨 and 찾기. In addition, no constituent 찾기 can be replaced by another word (*즐겨편집 ‘frequently edit’, *즐겨저장 ‘frequently save’, *즐겨삽입 ‘frequently insert’). Regarding multi-word terms formed of several autonomous words with one or several spaces (무선 인터페이스 ‘air interface’, 아날로그 인터페이스 ‘analog

interface’,

상용

소프트웨어

‘commercial

software’,

비상용

소프트웨어 ‘non-commercial software’, 압축 시스템 ‘compression system’ 설치 시스템 ‘installation system6’), these will be considered in future work if their meaning is non-compositional (especially since a constituent can be replaced by another word as can be seen in the examples above).7

The Korean module contains the same fields as those given in the original dictionary with some minor adaptations. Each Korean term sense is accompanied by a French equivalent, an actantial structure, and a list of terms corresponding to the linguistic realizations of each actant. Semantically related terms are also listed in a separate table. For example, the description of the Korean term 웹 1 ‘Web’ first indicates its French equivalent: 웹 1 (Fr. Web 1)

Then, its actantial structure, again with actantial roles, and the linguistic realizations of actants are given (cf. Table 4). Table 4 Actantial structure and linguistic realizations of actants PATIENT-에 개입하기 위해 INSTRUMENT-을 통해 AGENT-이 이용하는 웹 1 ‘Web 1 used by AGENT for PATIENT with INSTRUMENT’ Agent Patient Instrument 사용자 1 ‘user’

데이터 1 ‘data’

네티즌 1 ‘netizen’

정보 1 ‘information’

방문자 1 ‘visitor’

웹페이지 page’

1

웹브라우저 1 ‘Web browser’ 브라우저 1 ‘browser’

‘Web

Semantically related terms are listed and the relationships between terms are formalized with lexical functions. An additional explanation of the relationship is also provided (an example is given in Table 5). Table 5 Related terms for 웹 1 ‘Web’ Lexical function Syn Gener

Explanation of the relation Synonym Hyperonym

Related term WWW 1

Gener

Hyperonym

망 1 ‘network’

Holo

Holonym

인터넷 1 ‘Internet’

Part

Meronym

사이트 1 ‘site’

Labreal12

Agent-이 Key word-로 Patient-을

검색하다 1 ‘to search’

S0Labreal12

검색하다 -> NOUN

검색 1 ‘search’

Real1

Key word-를 항해하다

항해하다 1 ‘surf’

네트워크 1 ‘network’

Real1

Key word-를 방문하다

방문하다 1 ‘visit’

As can be seen in Tables 4 and 5, the structure of the articles in the Korean module is very similar to that of the original dictionary. Actantial role labels and lexical functions prove very useful to capture generalizations across languages. However, since word order is relatively free in Korean, actants in the actantial structure are accompagnied by case markers. Case markers are also reproduced in the explanations of the semantic relationships between terms. We can see how this information is supplied in the last four lines of the second column in Table 5: in , 이 (Rom. i) is the nominative marker (indicating the subject); 로 (Rom. ro) is the marker for instrument (indicating the oblique object); 을 (Rom. ul) is accusative marker (indicating the object).

4

Methodology for building the Korean module

To construct the Korean module, we first prepared relevant material (i.e. corpora and concordances). Then, terms are selected by using a combination of automatic and manual procedures: terms are extracted from the corpus using a term extractor; the list produced by the extractor is filtered and some original forms are restored manually; then a manual selection is made by a terminologist. Once terms are selected, their senses are disambiguated. Finally, the articles are written following three different steps: the actantial structure of each term sense is defined and semantic relationships are listed and encoded. Figure 1 illustrates the different steps of the methodology. They will be dealt with in detail in the following subsections.

Korean Corpus

Preparation of Korean material

Selection of terms Macro Structure

Automatic extraction

TermoStat

Filtering of candidate terms Restoration of original forms Manual term selection Sense distinction

Actantial structure Micro Structure

Semantically related terms Assignment of French equivalents Figure 1 Methodology for developing the Korean module

4.1

Preparation of Korean material

In this project, two corpora of a different nature were used: a large scale corpus of approximately forty million eojeols 8 and a four million eojeol corpus containing texts related to computing. The Korean domain specific corpus is subdivided into various subfields (such as hardware, software, the Internet, programming, operating systems) and the general corpus is composed of different types of texts such as novels, plays, newspaper articles, etc. Table 6 gives more details on the composition each corpus. As will be seen below, the general corpus

was used exclusively for term extraction, whereas the domain-specific corpus is used at all steps of the analysis of terms. Table 6 Korean corpora Specialized corpus General corpus

Size 4 230 000 eojeols 42 000 000 eojeols

Text types hardware, software, Internet, programming, operating system, etc. Literature (novels, poetry, theater), society, health, utility, culture, history, etc.

The corpora were both morpho-syntactically analyzed. The list given below shows the results of the morpho-syntactic analysis of a sentence. Each group of Korean letters9 in the first column corresponds to an eojeol. In the second column, the tags supplied by the morpho-syntactic tagger are presented. The symbol “+” is inserted between tagged morphemes.10 이 PC

이/mmd PC/f

제품들은

제품/ncn+들/xsn+은/jxc

특정

특정/ncps

기능에

기능/ncn+에/jca

특화된

특화/ncn+되/xsv+ㄴ/etm

non-PC 형

non-PC 형/ncn

정보기기로써

정보기기/ncn+로써/jca

인터넷

인터넷/ncn

접속과

접속/ncpa+과/jcj

간단한

간단/ncps+하/xsm+ㄴ/etm

컴퓨팅이

컴퓨팅/ncn+이/jcs

가능하며,

가능/ncps+하/xsm+며/ecc+,/sp

사용하기

사용/ncpa+하/xsv+기/etn

쉬운

쉽/paa+은/etm

특징을

특징/ncn+을/jco

갖는다

갖/pvg+는다/ef

To generate concordances from the corpus, we used a concordancer provided by the Korean research center KORTERM. For a given lexical item, the

system provides all the sentences in which it appears. An example is given below for the term 인터넷 ‘Internet’. 이 PC 제품들은 특정 기능에 특화된 non-PC 형 정보기기로써 인터넷 접속과 간단한 컴퓨팅이 가능하며, 사용하기 쉬운 특징을 갖는다. ‘As nonPC type information tools, these PC products with specific functions simplify Internet access and computer operation, and they are easy to use.’

4.2

Selection of terms

As was said above, the selection of terms is carried out using a combination of automatic and manual procedures leading to the definition of a list of headwords that are then described in separate articles. The five different steps of this selection and the decisions made by terminologists are described in the following subsections.

4.2.1

Automatic extraction

Term candidates are automatically extracted by a term extractor called TermoStat (Drouin 2003). The extractor compares a specialized corpus to a general corpus and applies statistical calculations to identify specific terms (terms which are more frequent in the domain specific corpus). To extract term candidates that appear in a specialized corpus more often than can be predicted theoretically, TermoStat uses “a test-value threshold of +3.09, which means that the probability of finding the observed frequency is less than 0.001” (Drouin 2003: 101). It is assumed that these frequent units are more likely to be terms. In our work, the frequency of each lexical form in the corpus containing texts on computing is opposed to that observed in the general corpus. Also, the calculations were applied to single-word terms regardless of their part of speech. Table 7 shows the first 10 candidates proposed by TermoStat. Table 7 Examples of term candidates Frequency 12613

Terms 사용자 ‘user’

Specificity 340.7149

8055

시스템 ‘ system’

258.9969

7041

메시지 ‘ message’

254.3181

1455

프로세스 ‘ process’

115.0897

1455

파일 ‘ file’

109.9741

4929

검색 ‘ search’

107.6402

3228

경로 ‘directory’

103.0495

960

4.2.2

호스트 ‘host’

95.8143

1521

웹 ‘Web’

92.8989

1062

소프트웨어 ‘software’

91.1562

Filtering of candidate terms

Before selecting terms in the list of term candidates extracted automatically by TermoStat, we discarded some items which caused noise. Units discarded include anglicisms, abbreviations and symbols. Anglicisms: In the Korean corpus, English terms are frequently used: access, activity, adapter, address, agent, algorithm, applet, application, area, authorization. These English terms appear in their original forms, in transcribed forms (액세스 ‘access’, 이메일 ‘email’, 컴퓨터 ‘computer’) or in translated forms (전자편지 ‘email’, 주소 ‘address’). Furthermore, all the different forms can be found in the same text. In this work, we discarded English terms used in their original form. The other borrowed forms were considered as relevant units. Abbreviations, named entities and symbols: In the corpus, abbreviations, symbols and named entities can also be found. Some examples are provided below: 1) Abbreviations: ADSL, CPU, PC, OS, URL, ID, IP, FTP, ROM, RAN; 2) Named entities: 윈도우 ‘Windows’, 리눅스 ‘Linux’, 텔넷 ‘Telnet’, 솔라리스 ‘Solaris’, 자바 ‘Java’, 펜티엄 ‘Pentium’; 3) Symbols: www, http, @.

These forms are selected only when they are used as common nouns. For example, the term 웹 ‘Web’ and 펜티엄 ‘Pentium’ behave as common nouns since they can be accompanied by modifier such as a numeral or an adnominal modifier.

4.2.3

Restoration of original forms

In Korean, a number of verbs composed of a nominal base and the suffix –

하다 (Rom. -hada): 전송하다 (Rom. jeonsonghada) ‘transfer’, 삭제하다 (Rom. sakjehada) ‘delete’, 제어하다 (Rom. jeeohada) ‘control’, etc., can be found in texts on computing. The morpho-syntactic tagger used in this work

automatically separates the base from the suffix, and assigns the tags “/ncps” or “/ncpa” to the base. As a result, ambiguous nominal forms can be found in the tagged corpus. For example, 저장 (Rom. jeojang) can be both a noun and a verbal base: 저장/ncn ‘backup’ as a noun and 저장/ncpa ‘save’ as the base of the verb 저장하다 (Rom. jeojanghada) ‘save’. We restored the original forms in our database which resulted in obtaining 저장하다/pvg and 저장/ncn instead of two different entries 저장/ncpa and 저장/ncn. This restoration can be done without difficulty because the system analyzes the form (저장/ncpa) only when it is really used as a verb in the corpus.

4.2.4

Manual term selection

The list of candidate terms produced by TermoStat contains forms which cannot be considered as terms: 기존 ‘precedent’, 방법 ‘method’, 대하다 ‘it is about ~’, 연구 ‘research’, etc. These words were analyzed as specific units by the extractor since our domain specific corpus is composed of scientific articles. Although they are typically used in scientific literature, they cannot be considered as representative of the domains of computing and the Internet. Among the term candidates proposed by TermoStat, terms are selected using four lexico-semantic criteria.11 The first criterion requires that the candidate term denote an entity related to the field we are dealing with, namely computing and the Internet. This criterion is the one normally used in terminology projects and applies to entities denoted by nouns. However, since the DiCoInfo (its French and Korean components) considers terms which belong to other parts of speech, other criteria were devised in order to take them into account. The second criterion concerns the nature of the unit’s actants. If these actants are specialized according to the first criterion, then the unit is most likely to be a term itself. The third criterion considers that semantically related derivatives must also be defined as terms. Finally, the fourth criterion tests the term’s domain specificity through its paradigmatic relationships (for terms that are not morphologically related). According to this criterion, 클라이언트 ‘client’ is selected as a term because its contrastive term 서버 ‘server’ is selected according to the first criterion.

4.2.5

Sense distinction

After selecting terms according to the parameters given in the preceding subsections, we distinguish the different senses of polysemous units using another series of lexico-semantic criteria.

First, we distinguish the different senses of each form by testing the cooccurrence of actants (Mel’čuk et al. 1995). Also, synonyms, derivatives or compounds, and other paradigmatically related terms serve as bases for this distinction. For example, we distinguish two senses for 사용자 ‘user’ because it can enter into two different series of paradigmatically related terms (nonmorphological relationships); one consists of 그룹 ‘group’, 인터넷 ‘Internet’, 컴퓨터 ‘computer’, 인증 ‘authentication’; and the other, 개발자 ‘developer’, 제공자 ‘supplier’, 응용프로그램 ‘application’.

4.3

Contents of the articles

Once terms are selected and thus considered as relevant headwords for separate descriptions, articles are written following three major steps which are described in the following subsections. The description of the actantial structure and the listing of semantically related terms rely on the observation of concordances generated from the corpus.

4.3.1

Actantial structure and linguistic realizations of actants

The actantial structure is composed of the headword and its actants. Having applied lexico-semantic criteria to the lexical form 저장하다, we distinguished two different senses, i.e. ‘save’ and ‘assign’ and obtained the following actantial structures: 저장하다 1 ‘save’ AGENT-이 SUPPORT-에 PATIENT-을 저장하다 1 저장하다 2 ‘assign’ AGENT-이 SUPPORT-에 PATIENT-을 저장하다 2

The actantial structure, thus represented, is the result of the observation of a large number of concordances extracted from the corpus. A sample is reproduced in Table 8. Table 8 Concordances for 저장하다 1 ‘save’ and 저장하다 2 ‘assign’ 저장하다 ‘save’

1

그 정보는 실시간으로 통합서버에 저장이 된다. ‘The data are saved on an integration server in real time.’

세션 객체에 저장되는 데이터들이 클라이언트 대신 서버에 저장이 되기 때문에 일정 시간 계속되면 서버의 자원을 많이 소모하게 된다. ‘Since the data saved during the session are saved on a server instead of on a client, if this state remains, the resources of the server are exhausted.’ 오프라인 상에서는 테이프 저장 장치를 이용하여 고용량의 데이터를 안전하게 저장하고 관리하고 있다. ‘On an off-line state, we safely save and manage large amounts of data on a storage tape. 그 정보는 실시간으로 통합서버에 저장이 된다. information is saved on the integration server in real time.

‘The

PC 에 전송된 내용은 파일의 형태로 저장된다. ‘The contents transferred to a PC are saved as files. JPG 등의 이미지 파일로 저장하는 것도 가능하다. ‘It is possible to save an image file such as . 파일은 다른 이름으로 저장하다 ‘Files can be saved under other names. 저장하다 ‘assign’

2

환경 변수는 문자에 대한 포인터 배열의 포인터로 저장된다. ‘The variable is assigned by a value as a pointer.’ 전체

변수

집합에

접근할

필요가

있는

드문

경우에는

환경변수 값을 어디든 저장할 수 있다. ‘When we need to access total variable sets, users can assign the value in the environment variable.’ 프로그래밍을 하기 위해서는 특정 시점에서 값을 저장할 수 있는 저장 장소가 필요하다. ‘For programming, we need a storage in which users can assign the value.’ 이러한 저장 장소를 변수라 한다. ’We call this kind of storage to which the value is assigned a variable.’ 기본 유형을 다음 세 위치에 서로 다른 엔터티로 저장할 수 있습니다. ‘We can assign the basic type data in following three positions as different entities.’ 변수이지만

반드시

수치

값만

저장되는

것은

아니며

문자열이나 포인터 같은 좀더 복잡한 값도 저장될 수 있다. ‘We can assign not only a numeral value, but also strings or a pointer in this kind of variable.’

Contexts such as those listed in Table 8 allow us not only to analyze the actantial structure but to discover the linguistic realizations of actants. As shown in Table 9, the linguistic realizations of the actants of 저장하다 1 differ from those of 저장하다 2. Table 9 Linguistic realizations of actants Entry

Agent

Patient

Support

저장하다 1 ‘save’

사용자 1‘user’

정보 1 ‘information’

관리자 1 ‘administrator’

자료 1 ‘resource’

통합서버 1 ‘integration server’

데이터 1 ‘data’ 파일 1 ‘file’ 이미지 ‘image’ 개체 ‘entity’

저장하다 2 ‘assign’

프로그래머 1 ‘programmer’ 개발자 1 ‘developer’

값 1 ‘value’

컴퓨터 1 ‘computer’ 디스크 1 ‘disk’ 하드디스크 1 ‘hard disk’ 저장소 1 ‘storage’ 변수 1 ‘variable’

수치 1 ‘numeral’ 문자열 1 ‘string 문자 1 ‘character’ 포인터 1 ‘pointer’

4.3.2

Semantically related terms

Each article contains a list of semantically related terms. As was said above, paradigmatic and syntagmatic relationships are considered. We first extract terms which share a morphological relationship with a given key term. For example, the Korean term 설치 ‘installation’ is related to 설치자 ‘installer’ and 설치하다 ‘to install’. From the key term 설치, another noun 설치자 and a verb are derived respectively with a suffix 자 (Rom. ja) and a verbal suffix -하다 (Rom. –hada). We will see further that in Korean other morphological regularities which do not exist in French can be discovered in the corpus and then encoded in the database. We then search for synonyms, antonyms, holonyms, meronyms, hyperonyms and hyponyms. Table 10 shows how this first series of semantically related terms are listed for different senses of 주소 ‘address’.

Table 10 Three groups of paradigmatically related terms of 주소 ‘address’

주소 1 ‘address’

주소 2 ‘address’

주소 3 ‘address’

프로그램 1 ‘program’ 포인터 1 ‘pointer’ 바이트 1 ‘byte’ 상수 1 ‘constant’ 문자 1 ‘character’

인터넷 1 ‘Internet’ 단말기 1 ‘computer’ 호스트 1 ‘host’ 클라이언트 1 ‘client’ 서버 1 ‘server’

계정 1 ‘account’ 아이디 1 ‘ID’ 이메일 1 ‘email’ 스팸메일 1 ‘spam mail’ 수신자 1 ‘recipient’

We also extracted terms typically used with the key term, i.e. collocates, such as light verbs, adjectives, and adverbs. Table 11 shows some syntagmatically related terms for the three different senses of 주소. Table 11 Two groups of syntagmatically related terms of 주소 ‘address’

주소 1 ‘address’

주소 2 ‘address’

저장하다 2 ‘to assign’ 저장 2 ‘assignment’ 처리하다 2 ‘to process’ ~테이블 ‘~table’ 버퍼 1 ~ ‘buffer ~’

공유하다 할당하다 설정하다 충돌하다 변환하다

4.3.3

1 ‘to share’ 1 ‘to allocate’ 1 ‘to set’ 1 ‘to conflict’ 1 ‘to transit’

주소 3 ‘address’ 등록하다 ‘to register’ 개설하다 ‘to create’ 입력하다 1 ‘to input’ 발송하다 1 ‘to send’ 받다 1 ‘to receive’

Assignment of French equivalents

The equivalents are assigned manually considering their actantial structures and semantically related terms. Semantic distinctions differ in each language, but since our articles are devoted to specific senses and not to lexical forms, the assignment of equivalents is facilitated. For example, 삭제하다 has two different senses and each sense corresponds to a different French equivalent: supprimer 1 ‘delete’ and désinstaller 1 ‘uninstall’. Similarly, 저장하다 ‘save’ is described in two different articles; one corresponds to the French sauvegarder 1 ‘save’; and the other, affecter 1 ‘assign’. Table 12 shows how Korean headwords and their French equivalents are represented in our database. Korean entries identified with a sense numbers are matched with French equivalents which are also described with a specific sense number. The order of sense numbers does not have a signification other that to distinguish senses within the databases. Only nominalizations of verbs in French follow a systematic rule: first their number corresponds to that assigned to the verb; secondly, senses of activity and senses of result are indicated respectively with a and b.

Table 12 Korean entries and their equivalent Korean terms 사용자 1 ‘user’ 사용자 2 ‘end user’ 삭제하다 1 ‘delete’ 삭제하다 2 ‘uninstall’ 파일 1 ‘file’ 저장 1 ‘backup’ 저장 2 ‘assignment’ 저장하다 1 ‘save’ 저장하다 2 ‘assign’ 검색 1 ‘search’

5

POS ncn ncn pvg pvg ncn ncn ncn pvg pvg ncn

French equivalents utilisateur 1 utilisateur final X supprimer 1 désinstaller 1 fichier 1 sauvegarde 1a affectation 1a sauvegarder 1 affecter 1 recherche 1a

Linguistic phenomena and difficulties

The methodology described above, which had previously been developed for a monolingual French specialized dictionary, proved useful to construct a module in a different language. Also, the model was used without requiring thorough adaptations. Even the assignment of equivalents could be carried out in an elegant manner in most cases. However, applying a model created for French to Korean, served to highlight some differences between the two languages and phenomena specific to each language which had an impact on our descriptions. This section discusses regular phenomena found in Korean and its specifities with respect to French.

5.1

Linguistic phenomena specific to Korean

We noted that a number of terms in the field of computing, when borrowed in Korean, were nominalized or verbalized. We will examine these two phenomena in turn.

5.1.1

Nominalization

The predominance of noun terms can be observed in many different languages and domains. In Korean terminology, this phenomenon is even more important compared with Indo-European languages because adjectival and verbal terms, when borrowed, are often transformed into nouns. For example, in the English phrase public key, public is used in Korean as a noun: 공개 (interestingly, syntactically it maintains its modifier role of the noun 키 ‘key’).

Nominalized terms can be further classified into three groups according to their behavior. This classification is presented below. First group of nominalized terms In Korean terminology, terms composed of two successive nouns (i.e. found in NN structures) are very frequent. The first group of nominalized terms is found most frequently in NN structures in which the first noun modifies the head noun, and the postposed noun is the head of the phrase. In this group, a suffix or a particle is hidden between the two nouns, more precisely behind the former noun playing a modifier role. In the corpus, we found four typical suffixes and an adnominal particle: –적 ‘of’ –성 ‘having the property of’ –형/–상 ‘having the form of’ –용 ‘for’ –의 ‘of’

In the examples given below, nouns placed in the first position in NN structures (가상 ‘virtual’, 휴대 ‘cellular’, 정보 ‘information’) modify the head nouns (현실 ‘reality’, 폰 ‘phone’, 시스템 ‘system’).12 The English adjective virtual in the term 가상(적)현실 ‘virtual reality’ corresponds to the Korean noun form 가상 ‘virtuality’ with a hidden suffix –적 ‘of’. In 휴대(용/형)폰 ‘cellular phone’, the adjectival constituent cellular is realized as a Korean noun with a hidden suffix –용 ‘for’. 가상(적)현실 ‘virtual reality’ 휴대(용/형)폰 ‘cellular phone’ 정보(의/용)시스템 ‘information system’

Second group of nominalized terms The second group comprises nominalized terms which are related to the expression of a sense of activation. The typical suffix for activation in Korean is – 화(化). This suffix corresponds to French derivational morphemes such as -ation, -age or to a longer expression such as . 동기화 ‘synchronization’ 양자화 ‘quantification’ 자동화 ‘automation’

암호화 ‘encoding’ 초기화 ‘initialization’ 전산화 ‘computerization’

Third group of nominalized terms The third group of nominalized terms consists of nouns composed of a verbal base (or an adjectival base) and a nominal suffix. To nominalize the verb (or adjective), the verbal ending –다 (Rom. -da) is first eliminated and a nominal suffix -기 (Rom. -gi) is added. 잘라내기 ‘cut’ 붙여넣기 ‘paste’ 즐겨찾기 ‘bookmark’ 비우기 ‘empty’

In French or English, the infinitive forms of verbs can be used like nouns. For example, they can become the subject of a sentence. Moreover, in the menu of a French or English word processor, some commands take the form of infinitives: couper ‘cut’, coller ‘paste’. However, this is not permitted in Korean. For this type of usage, verbs (or adjectives) must first be nominalized. Afterwards, they can even be used as direct or indirect objects in a sentence.

5.1.2

Verbalization of borrowed forms

Since the domain of computing is influenced by developments which have taken place in the West, Korean like many other languages, borrows several terms from English. Korean computer scientists often use these English terms in their original forms or in transcribed forms (cf. Section 4.2.2). When a verbal form of a borrowed term is required, the verbalization is realized in two stages; it is first nominalized, then it is verbalized. As shown in Table 13, nominalization is expressed with an English present participle form –ing or -tion (e.g. 인코딩 ‘encoding’ or 스캐닝 ‘scanning’) or a Korean nominal suffix –화 (e.g. 양자화 ‘quantization’, 암호화 ‘encoding’ or 디지털화 ‘digitalization’); verbalization is realized with a verbal suffix –하다 (Rom. -hada) added to the previously nominalized form. Table 13 Verbalization with –하다 (Rom. -hada) Transcribed terms Sino-Korean

인코딩하다(encode) → 인코딩(encoding) + 하다 (Rom. -hada) 스캐닝하다(scan) → 스캐닝(scanning) + 하다(hada) 양자화하다(quantify) → 양자화(量子 quantization)+하다 (Rom. -

terms

hada)

암호화하다(encode) → 암호화(暗號化 encoding)+하다 (Rom. -

hada) Mixed terms

디지털화하다(digitize) → 디지털화(digital+化-ation) + 하다

(Rom. -hada)

5.2

Difficulties resulting from the differences between languages

Some difficulties arose in the process of assigning French equivalents. However, major problems can be classified in three different groups: differences in the number of constituents in terms, differences in semantic distinctions and differences in lexical systems. We discuss these problems and how they were solved in the following subsections.

5.2.1

Difference in numbers of constituents in terms

A single-word term in a language often corresponds to a compositional multiword sequence in another language, and vice versa. For example, 큐잉 ‘queuing’ (Fr. mise en file d’attente) is considered in the Korean database as a single-word term, while it corresponds to a much longer French expression. In addition, mise en file d’attente ‘queuing’ is not considered as a term in the French database since its meaning is compositional. Similarly, French single-word terms, especially French adjectives such as programmable, accessible, etc. often correspond to Korean phrases (프로그램이 가능한 and 접근 가능한). These phrases are considered as expressions in the Korean module, are therefore not stored as entries.

5.2.2

Difference between semantic distinctions

There are some difficulties resulting from a disagreement between semantic distinctions made in each language. For example, the Korean term 제어 corresponds to the French term contrôle. However, in the French dictionary, contrôle has two different senses: “the activity of controling” and “a specific key called control”. (the latter sense is designated in Korean by a derivative (제어키 ‘control key’) or a compound term (제어 단추 ‘control button’ or 제어 장치 ‘control device’). Some Korean terms also have two different senses, while the corresponding French term has only one sense. For example, the Korean term 사용자 is polysemous: 사용자 1 ‘user’ and 사용자 2 ‘end user’. In French, only one

single-word term utilisateur ‘user’ corresponds to 사용자 1. On the other hand, 사용자 2 ‘end user’ corresponds to a noun phrase: i.e. utilisateur final.

5.2.3

Difference in lexical systems

We also observed some differences between the French and Korean lexical systems. For example, Korean causative and passive derivatives are formed by adding derivational suffixes to lexical bases. In Korean, there are two types of causatives: morphological (or lexical) and syntactical. This work only had to deal with morphological causatives. For example, Korean adjectives are transformed into verbs by inserting a causative suffix (이, 히, 리, 기, 우, 구 or 추) between the base and the ending: 비다 adj. (Rom. bida) ‘be empty’13 비 (the base) + 우 (causative suffix) → 비우다 비우다 v.tr (Rom. biuda) ‘to empty’

The adjectival form 비다 ‘be empty’ means ‘a state of emptiness’. But, the verbal form 비우다 ‘to empty’ has two different meanings: direct cause and indirect cause. 비우다 ‘X makes Y empty Z’ 비우다 ‘X empties Y himself’

The verbal term 비우다 ‘to empty’ means to ‘cause a state of emptiness’. These two terms (비다 ‘be empty’ and 비우다 ‘to empty’) are frequent in our domain specific corpus and have been added to the Korean module. A similar transformation can be found in verbal forms: e.g. 돌다 ‘to work’ and its causative form 돌리다 ‘make something work’. The first is an intransitive verb and the second is a transitive verb having two senses: “someone makes someone else work something” and “someone makes something work himself”. Passive forms present a similar problem: 나누다 ‘to partition’ corresponds to the French equivalent partitionner; but, its passive form 나뉘다 does not have a lexical equivalent. In French, the latter corresponds to the grammatical passive form of the term partitionner ‘to partition’, while it is an autonomous lexical unit in Korean.

5.2.4

Equivalence in an intermediate table

We solved the problems raised by the differences between the two languages discussed above by designing an intermediate equivalence structure as shown in

Table 14. The table contains four fields which are used for different purposes: two fields are presented to users to show exact equivalents; the other two fields allow us to link a record to the actual database in each language. Hence, the structure allows the matching of exact equivalents, and a term in a given language provides a link to headwords that are considered as true terms in each database. Table 14 Intermediate table Korean I-Korean 비다 1 ‘be 비다 1 empty’ 빠르다 1 ‘be fast 빠르다 1 명령어 1 명령어 1 ‘command language’ 나뉘다 1 ‘be 나뉘다 1 partitioned’ 클릭 1 ‘click’ 클릭할 수 있는 ‘possible to click’ 업그레이드 1 업그레이드 할 수 ‘upgrade’ 있는 ‘possible to upgrade’

I-French être vide

French vide 1 ‘empty’

être rapide langage de commande

rapide 1 ‘fast commande 1 ‘command’

être partitionné

partitionner 1 ‘to partition’ cliquable 1 ‘clickable’

cliquable 1 améliorable 1

améliorable 1 ‘upgradable’

Link to the Korean database Korean equivalent presented to the user French equivalent presented to the user Link to the French database

For example, the Korean term 빠르다 ‘be fast’ corresponds to the French expression être rapide which does not appear as the headword of an article since it is composed of a verb and an adjective and is compositional. In the intermediate table, we link être rapide ‘be fast’ to the adjectival entry rapide ‘fast’. This allows users to access the true equivalent of the Korean 빠르다 and still be able to consult an article in which être rapide will be described as a collocation of rapide.

6

Concluding remarks

In this article, we introduced a French specialized dictionary, the DiCoInfo, and the application of its structure and methodology to another language, namely Korean. We presented the different components of the Korean module and the methodology for developing it. We explained how the entries were selected and defined in five stages: automatic extraction of term candidates, filtering of the list, restoration of original forms, manual term selection and sense distinction. The contents of articles and the different tasks for designing them have also been presented and exemplified: the actantial structure, the linguistic realizations of actants, the list of semantically related terms, and the assignment of French equivalents. Interestingly, our methodology, first devised for a monolingual database, did not need to be significantly altered and proved useful to construct a module for a typologically different language. For example, even if the list of entries is defined in each language independently, most terms have a lexical equivalent. We also observed many similarities between the actantial structures and semantically related terms in the two languages. Of course, some differences were observed while compiling the Korean database (three forms of nominalization and specific cases of verbalization). Also, we encountered some difficulties. Some French equivalents could not be assigned directly to Korean terms, due to the specificities of each language. Three types of differences were discussed: difference in the number of constituents in terms, difference between the semantic distinctions made in each language, and differences in the lexical systems of both languages. To overcome these divergences, we designed an intermediate table which allows users to view exact equivalents while still being able to access articles in each database. To complete this project, some future work is planned. First, we will finalize the descriptions for the remaining selected terms. In addition, we will work on the different relations that can be drawn between French and Korean. We plan to develop a model for capturing the regularities observed between French derivational morphemes and their corresponding form in Korean.

Notes 1 The database, which is updated regularly, can be accessed through the Internet (http://olst.ling.umontreal.ca/dicoinfo/). 2 Interestingly, an increasing number of projects aim at providing rich linguistic information on terms (Binon et al. 2000; Jousse and Bouveret 2003).

3

According to the ECL framework, the lexical unit is an entity which has three components: the sense, the form and the combinatory properties (Mel’čuk et al. 1995). 4 “An LF, in Igor Mel’čuk’s “Meaning ↔ Text” model (Mel’čuk 1974, Mel’čuk and Zholkovsky 1984; Mel’čuk et al. 1984-1999), has the basic properties of a multi-value mathematical function. A prototypical LF is a triple of element {R, X, Y}, where R is a certain general semantic relation obtaining between the argument lexeme X (the keyword) and some other lexeme Y which is the value of R with regard to X (by a lexeme in this context we mean a word in one of its lexical meanings or some other lexical unit, such as a set expression)” (Apresjan et al. 2002: 55). 5 The average number of senses for each term is 1.461 (168/115). 6 Here, the English translations are literal to illustrate the behavior of Korean terms. 7 In this regard, Korean presents a problem seldom observed in French terms. The use of spaces or punctuation marks cannot be considered as clues for identifying fixed units in Korean because they are not used rigorously. According to the rules of correct Korean orthography, a space should be used to identify boundaries between words. But writing without a space is acceptable (한국어학회 ‘Korean Language Society’ 1987). For example, the multi-word term 압축 시스템 ‘compression system’ can be written with or without a space (압축시스템). 8 The eojeol is a Korean linguistic unit that is separated from others by a space or a punctuation mark. Basically, it consists of a content word and function word. 9 The Korean writing Hangeul consists of 24 characters: 10 vowel letters (ㅏ, ㅑ, ㅓ, ㅕ, ㅗ, ㅛ, ㅜ, ㅠ, ㅡ, ㅣ) and 14 consonant letters (ㄱ, ㄴ, ㄷ, ㄹ, ㅁ, ㅂ, ㅅ, ㅇ, ㅈ, ㅊ, ㅋ, ㅌ, ㅍ, ㅎ). “In syllable-initial position 11 compound vowel letters and 5 geminated consonant letters occur in addition to the 10 simple vowel and 14 simple consonant letters and in syllable final position 15 double or geminate consonant letters occur in addition to the 14 single ones.” (Chang S. J., 1996:6) 10 The KAIST morpho-syntactic analysis system (http://morph.kaist.ac.kr/%7Emorph/) uses a tag set that comprises 69 tags. For the analysis of the example sentence, 20 tags are used: , , , , , 하다 (Rom. –hada)>, , , , 하다 (Rom. –hada)>, , , , , , , , , and . 11 For more information on lexico-semantic criteria which can be used to identify terms, refer to L’Homme (2004). In L’Homme and Bae (2006), the application of these criteria to term selection and sense distinction in the field of computing is described in detail. 12 In these three examples, the first two Korean nouns are realized as adjectives in English and only the third is realized in English as a noun form. In Korean, the adjective is a syntactically predicative unit. It is why the adjective 비다 is translated into be empty in English. Bae (2006) gives more explanations on the functioning of Korean adjectives in general and specialized languages. 13

Acknowledgements This work was financially supported by the Québec organization Fonds québécois de la recherche sur la Société et la culture (FQRSC). Key Sun Choi of KORTERM supplied the the Korean material (corpora and morpho-syntactic analysis system) and Patrick Drouin of OLST/Université de Montréal adapted TermoStat to enable us to extract terms from the Korean corpus. We would like to thank to them for their help. Thanks are also extended to Victoria Surtees who revised an earlier version of this article.

References Apresjan, Ju. D., Boguslavsky, I. M. Iomdin, L.L. Tsinman, L. L. “Lexical Functions in NLP: Possible Uses”, in Computational Linguistics for the New Millennium: Divergence or Synergy? Festschrift in Honour of Peter Hellwig on the occasion of his 60th Birthday. Klenner, M. Visser, H. (eds.), Heidelberg: Peter Lang, (2002): 55–72. Bae, H. S. “Termes adjectivaux en corpus médical coréen : repérage et désambiguïsation”, in Terminology 12, no. 1, Amsterdam/Philadelphia : John Benjamins, (2006): 19-50. Binon, J., Verlinde, S., Van Dyck, J., Bertels, A. Dictionnaire d’apprentissage du français des affaires. Dictionnaire de compréhension et de production de la langue des affaires, Paris : Didier, 2000. Chang, S. J. Korean, Amsterdam/Philadelphia: John Benjamins, (1996). Drouin, P. “Term Extraction Using Non-Technical Corpora as a Point of Leverage”, in Terminology 9, no. 1, Amsterdam/Philadelphia : John Benjamins, (2003): 99–117. Drouin, P. and Bae, H. S. “Korean Term Extraction in the Medical Domain by Corpus Comparison”, in Terminology and Knowledge Engineering, Copenhagen: Copenhagen Business School, (2005): 349-361. Han, Y. K. et al. “KAIST 품사태깅 매뉴얼 (Manual of KAIST Morpho-syntactic Anaylzer)”, in CS-TR, Daejeon, 2000. Jousse, A. and Bouveret, M. “Lexical Functions to Represent Derivational Relations in Specialized Dictionaries”, in Terminology 9, no. 1, Amsterdam/Philadelphia: John Benjamins, (2003): 71-98. Lee, I. S. and Im, H. B. 국어문법론 (Grammar of Korean language), Seoul: Hagyeonsa, 1983. Lee, S. O. 국어의 사동·피동 구문 연구 (Research on Korean Causative and Passive Phrases), Seoul: Jipmundang, 1999. L’Homme, M. C., dir. DiCoInfo, le dictionnaire fondamental de l’informatique et de l’Internet, http://olst.ling.umontreal.ca/dicoinfo/. 2006. —————. “Sur la notion de ‘terme’”, in Meta 50, no. 4, (2005): 112-1132. —————. La terminologie: principes et techniques. Montréal : Presses de l'Université de Montréal, 2004a.

—————. “Sélection des termes dans un dictionnaire d’informatique : comparaison de corpus et critères lexico-sémantiques”, in Actes. Euralex 2004, Lorient (France), (2004b): 583-593. L’Homme, M. C. and Bae, H. S. “A Methodology for developing multilingual resources for terminology”, in International Conference on LREC (Language Resources and Evaluation), Genoa: Magazzini del Cotone Conference Center, 2006. Lemay, Ch. L’Homme, M. C., Drouin, P. “Two Methods for extracting specific single-word terms from specialized corpora: Experimentation and Evaluation”, in International Journal of Corpus Linguistics 10, no. 2, (2005): 227-255. Mel’čuk, I. “Lexical Functions: A Tool for the Description of Lexical Relations in the Lexicon”, in Lexical Functions in Lexicography and Natural Language Processing, Amsterdam/Philadelphia: John Benjamins, (1996): 37-102. —————. Opyt teorii lingvisticheskix modelej “Smysl ↔ Tekst” (A Theory of Meaning ↔ Text Linguistic Models), Moscow: Nauka, 1974. —————., Clas, A., Polguère, A. Introduction à la lexicologie explicative et combinatoire, Paris: Duculot, 1995. Mel’čuk, I. et al. Dictionnaire explicatif et combinatoire du français contemporain. Recherches lexico-sémantiques 1-IV, Montréal: Presses de l’Université de Montréal, 1984-1999. Mel’čuk, I. and Zholkovskij, A. K. Tolkovo-kombinatornyj slovar’ sovremennogo russkogo jazyka (An Explanatory combinatorial Dictionary of the Contemporary Russian Language), Wiener Slawistischer Almanach, Sonderband 14, 1984. Office québécois de la langue française, Le grand dictionnaire terminologique, http://www.granddictionnaire.com/btml/fra/r_motclef/index1024_1.asp, accessed 16 May 2006. Polguère, A. Lexicologie et sémantique lexicale : notions fondamentales, Montréal: Presses de l’Université de Montréal, 2003. 한국어학회 (Korean Language Society), ed. 한글맞춤법 (Korean orthography), Seoul: Korean Language Society, 1987.