Can syntactic variations highlight semantic links between domain ...

1 downloads 0 Views 94KB Size Report
Thus "frozen sweet dough baking" is a R-Exp of "frozen sweet dough". A fourth expansion variant covers .... straight dough method, etc.). Hence, the link between.
In 6th International Conference on Terminology and Knowledge engineering (TKE’02), Nancy 28-30 august 2002, 57-63.

Can syntactic variations highlight semantic links between domain topics ? Fidelia Ibekwe-SanJuan*, Cyrille Dubois** * ERSICO Université de Lyon 3. Manufacture des Tabacs, 4 cours Albert Thomas, 69008 Lyon [email protected] **Centre

de Veille Technologique (CVT), Centre de Recherche Public Henri Tudor 66 rue de Luxembourg, L-4009 Esch-sur-Alzette, Luxembourg [email protected] Abstract

This paper deals with the use of syntactic variations for portraying domain topics in a specialised corpus. After term extraction and term variant identification, a clustering algorithm, CPCL is used to generate clusters of term variants, which represent fairly well the topics layout in the field considered. A close study of some class structures reveal that while no variation type can be said to be irrelevant for the task considered, binary term relations could lead to the formation of large heterogeneous classes.

1. Introduction The necessity for corpus-based terminology processing has been stressed in recent times (Cabré 2000, Slodzian 2000). Indeed, corpus-based terminology processing captures the essence of expert vocabulary as exhibited in specialised texts. It could highlight important terminological evolution and concept structuring indicators for applications such as domain terminology update or domain ontology construction. The application addressed here is concerned with mapping research topics through the analysis of a text corpus. Such an application can be useful for scientific and technological watch (STW). For this task, we posit that capturing variation relations amongst texts in a domain-coherent corpus and structuring these terms into classes will enable an expert user gain better insight into how topics are structured and related in his field. Hence, it will increase his competitive intelligence. Thus our work encompasses research on automatic term extraction and structuring on the one hand, and clustering algorithms for data analysis on the other. However, our approach to term extraction and clustering is wholly corpus-reliant. Our system does not need an external terminological database, lexicon resource or thesaurus from which it draws knowledge to trigger its different tasks. It exploits the syntactic behaviour of only those terms that are in the corpus, thus it aims to show terminological evolution "as it appears", without the bias of a handmade resource. This approach is most appropriate for the application we are targeting, where an expert is keen on "sniffing out" emerging or evolving research topics in his field. That in technical or specialised texts, most terms appear as multi-word units has been abundantly proven by previous empirical works on corpus-based terminology acquisition (Bourigault 1994, Katz & Justeson 1995) or updating (Daille 1994, Jacquemin & Royauté 1994). Thus, in our approach, we will mostly be interested in clustering multi-word nominal terms (≥ 2 nominals) into classes. In previous published works (Ibekwe-SanJuan, 1998a), we claimed that the type of syntactic variations we studied suggest semantic links,

which when clustered, depict the underlying structure of domain topics as contained in the corpus. Here, we wish to investigate these claims further. The questions we ask are : are all our variation relations relevant to this goal ? Which ones are most relevant ? Which ones produce noisy or irrelevant classes ? Trying to answer these questions will be the main focus of this paper. To gain insight into this problem, we carried out the whole process of term extraction and clustering on a corpus of English scientific abstracts dealing with the bread making process. The corpus was constituted to meet the scientific and technological watch (STW) request of a company who wanted to integrate new and if possible natural additives in its bread making process. The URI/INIST1 research team has worked on this corpus for this goal in collaboration with the CVT2 who provided expert validation. In their study (François & Dubois, 2001), a different clustering algorithm was used (the axial k-means algorithm, a partitioning data analysis method), and clustering was not based on variation relations. We are carrying out data analysis on the same corpus using our variation-based clustering algorithm. Our results are equally subjected to the same analyst from the CVT. The rest of this paper is organised thus : section §2 will briefly recall the syntactic variations studied and describe our clustering algorithm ; section §3 will present the results obtained from the corpus and examine the internal and external structures of some classes ; finally, section §4 will draw conclusions on the relevance of our syntactic variation relations for portraying topics content and layout in a specialised corpus.

2. Extracting and clustering terms 2.1.

1

Candidate term extraction

URI (Unité de Recherche et d'Innovation) / INIST (Institut de l'Information Scientifique et technique) based in Nancy. 2 Centre de Veille Technologique, CRP Henri Tudor based in Luxembourg.

In 6th International Conference on Terminology and Knowledge engineering (TKE’02), Nancy 28-30 august 2002, 57-63. The first stage towards term clustering is term extraction. For this, we performed a shallow linguistic analysis by describing term morphological and syntactic features in the form of finite state automata implemented in the INTEX linguistic toolbox (Silberztein, 1993). INTEX is equipped with linguistic resources to perform an overall morphological analysis on the texts and furnish us with tags. These tags are then used in the different automata to describe term composition and are applied in an iterative way on the corpus until we reach a satisfactory medium-grained noun phrase splitting. Our approach to term extraction can be likened to works done by Bourigault (1994) and Katz & Justeson (1995) on that issue. However, unlike in Bourigault (1994) our splitting approach does not produce redundant candidates, (i.e. our approach is deterministic, no candidate string is embedded in another), and unlike Katz & Justeson (1995), we allow for the presence of more lexical components (prepositions). Precise details on the different automata we defined and their performances can be found in Ibekwe-SanJuan (2001). Our splitting rules produce candidate terms which are subjected to manual filtering. For this corpus, an indexer of the INIST specialised on the field did the filtering. Unlike in Daille (1994), we do not find statistical measures (especially frequency) to be adequate filters because in most cases, this will filter out more than half of the candidate terms which occur only once (hapax). For our application, it will be particularly risky to eliminate so many terms solely on the basis of frequency or co-occurrence as even a single occurrence of a term can be of interest to our type of user. After filtering, 3651 candidate terms were retained from a corpus of approximately 70 000 words.

2.2.

Syntactic variants identification

The next stage is relating candidate terms through syntactic variations. The founding works on automatic term variant identification can be attributed to Jacquemin (1994) and to Daille (1994). According to the former, syntactic variation refers to an occurrence of a term "which cannot be identified through the sole considerations of inflection or hyphenation". In other words, syntactic variations deal with transformations which are outside the scope of morphology. This loose definition allows for a rather wide range of operations of which the most currently investigated are "insertions, coordinations, and permutations" for Jacquemin (1994), or "overcomposition" and "modification" for Daille (1994). The syntactic variations we studied (IbekweSanJuan, 1997) fall under these operations, though they were more restrictively defined in order to be easily interpretable for a domain expert. They can be viewed from a surface angle : variations that affect the initial length of a term which we named "Expansions" and those that do not affect it, named "Substitution". Expansion is further subdivided along the grammatical axis : those that affect the modifier words in a term and those that affect the head word. Modifier expansion describes the relation between two candidate terms whereby a term t1 has the same elements as a term t2, save for the addition of some modifier words in a modifier position. For instance, we will say that "gas holding property of dough" is a left-expansion (L-Exp)

of "gas holding property" because by transformation to a nominal compound structure, "gas holding property of dough" will yield "dough gas holding property". Likewise, "bread dough quality characteristics" is an insertion variant (Ins) of "bread characteristics". Head expansion (R-Exp) describes the addition of one or more nominals in the head position of a term, thus shifting the former headword to a modifier position. Thus "frozen sweet dough baking" is a R-Exp of "frozen sweet dough". A fourth expansion variant covers the two elementary types, L-Exp and R-Exp in that it describes addition of words both in the modifier and head positions, we then talk of LR-Exp, for example the relation between "nonstarch polysaccharide" and "functional property of rye nonstarch polysaccharide" (rye nonstarch polysaccharide functional property). Substitution is also defined along the grammatical axis to yield two sub-types : modifier and head substitution. Modifier substitution describes the replacing of one modifier word in term t1 by another word in term t2. Thus "bread dough leavening" is a modifier substitution (M-Sub) of "composite dough leavening". Head substitution (H-Sub) relates terms who share the same modifiers but different heads : effect of xanthan gum and addition of xanthan gum. Thus substitutions link terms of equal length where one and only one item is different. Our system can identify variations which occur in two syntactic structures : nominal compounds and noun phrase with prepositional attachment.

2.3.

Clustering term variants

We developed a clustering algorithm, CPCL (Classification by Preferential Clustered Link, IbekweSanJuan 1998b), which clusters terms basing on the variations described above. The six elementary variation relations are represented as a di-graph. Directed arrows represent expansion relations which generate anti-symmetrical links between terms, i.e. term t1 is an expansion of term t2 but not the reverse. Nondirected arrows represent substitution relations which generate symmetrical links between terms : if t1 is a substitution of t2, t2 is also a substitution of t1. Thus substitution relations are multiplied by two in the graph. A coefficient is computed for each type of variation relation. This coefficient is given as the inverse of the number of links for that variation type in the entire graph. Clustering is a two-stage process. First the CPCL algorithm builds connected components using a subset of the variation relations, usually the modifier relations (L-Exp, Ins, M-Sub). These connected components are sub-graphs which represent term variants that share the same headword (the same paradigm). At the second stage, the connected components are clustered into classes using the head relations (R-Exp, LR-Exp, Hsub). This second stage groups together components whose terms are in one of the head variation relations. The formation of classes is based on the following principle : two components c1 and c2 are clustered if the link between them is stronger than the link between any one of them and a third component c3. The strength of this link is the sum of the coefficients for the head variation relations between the two components. The

In 6th International Conference on Terminology and Knowledge engineering (TKE’02), Nancy 28-30 august 2002, 57-63. CPCL algorithm can be iterated several times to suit the user's requirement or until it converges, i.e, until classes are stable. A more formal description of this algorithm can be found in Ibekwe-SanJuan (1998b). The variation identification and the clustering programs have been implemented in the AWK language and can run on a Unix or Windows system.

3. Analysis of class relevancy In this particular experiment, after iterating the CPCL algorithm on our corpus, we chose the 2nd iteration as that which offers the best partitioning of domain classes. 33 classes were obtained with variable sizes, the biggest class had 218 terms. The classes were then subjected to the domain analyst for validation. The analyst had to say if a class represented a coherent domain topic, he had to name that topic and also determine if the external links between classes were sound. Table 1 below shows the topic represented by each class and the number of terms in it. Class terms Topic name 1 13 Measurement; thickness 2 8 Wheat bread ? 3 4 Anti staling 4 5 Properties (firmness; softener) 5 4 Spring wheat or red spring wheat 6 33 Natural components or elements 7 9 Natural oil 8 6 Salt effect 9 5 "Vegetable" seed 10 5 Frozen dough 11 7 Quality of ingredients 12 10 Physical properties 13 53 Stability (fermentation; storage), dough handling 14 32 Effect of adding various substances, bread dough working ? 15 4 Pump (uninteresting) 16 10 Kind of flour (= meal) 17 4 Flour effect 18 16 Starter 19 42 Bread final aspect 20 90 Molecules 21 28 dough preparation methods /procedures ? 22 218 Physical or chemical parameters influence; enzymes 23 4 Fermentation 24 16 Dough physical properties 25 12 Rheology; dough; mixture 26 5 Sulphur 27 41 Acid effect 28 17 Water influence 29 19 Bread quality 30 55 Yeast 31 11 Enzyme action 32 198 ??? (too vast, heterogeneous) 33 12 ???

Table 1: Topic(s) represented by each class. In most cases, the name chosen by the analyst for a class was taken directly from the class's content. Thus, terms extracted by our system were often adequate to name a class's topic. From this analysis, we can identify three categories of classes : (a) classes that represented known and relevant domain topics whose names are given, (twenty-six in number); (b) classes whose topic were partially or not at all identified. Their name is followed by a ? or represented by ??? in the table. And finally (c) classes whose topic though identifiable were uninteresting for STW, only one class is concerned. Among the relevant classes, one class (6) represented an emerging topic at the time of corpus constitution (1998) according to the analyst. In the sections below, we will examine the internal and the external structures of a few3 classes from each category in order to determine if some variation types were more prevalent in certain categories.

3.1.

Structure of some relevant topics

We chose two classes (13 and 6) of variable sizes. Figures 1 and 2 hereafter show the internal structure of these classes, i.e. the term variants which resulted in the class formation. These figures exhibit rather connected structures especially for class 13 where many of the links are initiated by anti-symmetrical relations (expansions). Class 6 which depicted an emerging topic has a less interconnected structure though this could simply be due to its smaller size. A closer look at the term variants in class 6 shows its vocabulary to be relatively specific. The variants around wheat germ, wheat bran and bran incorporation appeared only in this class. This finding is supported by the external position of this class (see Figure 7). Class 6 does not occupy a central position and is linked to 2 other classes only (29 and 20). Note that class 6 contains variants that could point to interesting indicators for the STW request, i.e. the existence of new natural additives in the bread making process.

3.2.

Structure of some (un)identified topics

It is interesting to note that Figures 3 and 4 (classes 21 and 33 resp.), which depict the internal structure of partially identified or unidentified topics contained only symmetrical links. It is also interesting to note that all the variation relations in class 33 are made up of binary symmetrical links (two-word terms) while those in class 21 are made of ternary symmetrical links (three-word terms). Indeed, binary variants, due to their length, tend to form abundant chains of symmetrical relations whose significance is not always clear. This is mainly the reason why class 32 (see Table 1) is too vast. Most of its internal links were initiated by binary symmetrical relations. Moreover, these classes tend to have a wholly interconnected structure. Class 33 contained 12 terms most of which were interconnected. We also observed the same interconnectedness of internal structure for a partially identified topic (class 2) whose 8 terms were all involved in variation relations (Figure 5), though not 3

Owing to space limitation

In 6th International Conference on Terminology and Knowledge engineering (TKE’02), Nancy 28-30 august 2002, 57-63. all were symmetrical links. This may indicate that the degree of interconnectedness is not an indicator of class relevancy. Finally, we note that although the topic depicted by class 21 was partially identifiable, this class is only linked to class 10 whose topic deals with "frozen dough" (see Figure 7). A closer look at the term variants in class 21 shows them to revolve also around this concept (frozen dough product, frozen dough method, straight dough method, etc.). Hence, the link between the two classes is lexically motivated.

3.3.

Structure of an uninteresting topic

Class 15 (Figure 6) is a very small class with only 4 terms out of which 3 are in variation relations. These variations are all anti-symmetrical relations (expansions). We cannot however conclude from this single example that such links could indicate topic irrelevancy. Indeed, it should rather be the reverse as the term variants involved here (multi-word terms and anti-symmetrical variations) normally underline shifts in either a property (modifier words) or the concept (head word). Moreover, the external position of this class seems coherent with the interpretation in that it is linked to only one class (see Figure 7). We note also that the topic represented by this class was identifiable but deemed uninteresting for the task considered (scientific and technological watch). It may then correspond to a residuary research issue (dough pump technology).

3.4.

Overall topics layout

Figure 7 shows almost all the classes to be in one network. Four classes appear isolated in this figure. This maybe because their external links were below the threshold we considered for clustering (0.001). Apart from class 32 which is linked to a lot of other classes due to its heterogeneity, the general layout of domain topics appeared relevant to the analyst. The external links between classes 20 and 22 for example are lexically and semantically justified. Class 22 is always linked to classes which have a meaning for the practician; rheology (25) is a physical parameter, just as enzyme (31) or water (28) are chemical parameters that influence the final product. Class 20, central in its links with the classes 6, 29 and 14 seems to deal with the effect of different components/molecules on the bread quality. The same relevance of external links is observed among the other classes analysed. Thus, the syntactic variation methodology used here to portray the external links can be considered as promising.

fermentation stability

improved dough handling

detrimental dough handling property

dough handling

dough handling property dough handling characteristic

dough stability

fermentation quotient

improved dough stability

dough fermentation

sweet dough stability

sour dough fermentation quotient wheat dough fermentation quotient

dough fermentation characteristic frozen dough stability sweet dough fermentation frozen dough storage gluten fortified frozen dough storage stability

sour dough fermentation frozen prefermented dough fermentation stability

Figure 1. Class 13 : Stability (fermentation, storage); dough handling wheat germ enrichement effect

defatted germ wheat germ

wheat germ effect wheat bran wheat germ incorporation rye bran

wheat flour rye bran supplementation

wheat bran incorporation wheat flour supplementation wheat flour blend wheat flour fractionation

Figure 2. Class 6 : Natural components or elements

In 6th International Conference on Terminology and Knowledge engineering (TKE’02), Nancy 28-30 august 2002, 57-63. frozen dough product frozen dough characteristic lag phase

frozen dough method straight dough method

straight dough process

straight dough system

straight dough procedure

sponge dough bread

sponge dough procedure

maturing phase

gluten phase liquid phase

gluten fraction

fine fraction

liquid fraction

glutenin fraction

starch fraction carbohydrate fraction

lipoxygenase fraction wip fraction

sponge dough yield

Figure 3 : class 21 (dough preparation methods /procedures ?) spring wheat cultivar

gluten sample

classified bread sample

wheat cultivar

wheat sample

bread sample

Figure 4 : class 33 (???) dough pump new dough pump new dough pump from campbell technology inc

bread cultivar european common wheat sample

Figure 5 : class 2 ( wheat bread ?)

Figure 6 : class 15 (pump)

In 6th International Conference on Terminology and Knowledge engineering (TKE’02), Nancy 28-30 august 2002, 57-63. class 1 : measurement ; stickiness class 30 : yeast class 29 : bread quality class 10 : frozen dough

class 6 : natural components

class 15 : pump

class 24 : dough physical properties

class 20 : molecules

class 32 : ???

class 21 : dough preparation methods / procedu res ?

class 14: effects of various substances

class 33 : ???

class 22 : physical or chemical properties; enzymes

class 27 : acid effect

class 13 : stability class 18 : starter class 19 : bread final aspect

class 2 : wheat bread ?

class 25 : rheology; dough; mixture class 4 : properties (firmness; softener)

class 31 : enzyme action

class 16 : kind of flour (meal)

class 12 : physical properties class 8 : salt effet

class 28 : water influence

class 9 : vegetable seed

Figure 7 : External layout of classes used grammatical relations (head and modifier relations) to cluster simple terms into connected 4. Concluding remarks components also called classes. In their study, complex To the question "can syntactic variations portray nominals were split into simple head-modifier semantic links between domain topics", the answer structures (i.e 'Adj-N' or 'N-prep-N' structures). The would seem to be yes from two empirical experiments graphs they obtained showed mostly the contexts in on different corpora we have carried out to date using which a set of nouns or adjectives were employed by our method (see Ibekwe-SanJuan 1998a). Though quite other domain head nouns or modifiers. Unlike our elementary in nature, the syntactic variation relations CPCL algorithm, their clustering algorithm (Zellig) we studied seem to yield significant meanings for a does not partition the grammatical relations into two. domain expert seeking to grasp the structure of research Also, clustering is not hierarchical since there is no topics in his field. Also, the term variants extracted notion of "strength of link" between two components. were often adequate to name a class's topic. In other At a given time, their algorithm seeks the maximal sets clustering methods, experts have to reformulate most of of connected components in the graphs. The application the class's topics as the items used for clustering (single they targeted is an initial stage for acquiring relations words or keywords) were often inadequate for this task. for domain ontology construction. Empirical evidence also shows that variation We have sought to explore further in this paper if a phenomena, far from being accidental, are very frequent type of variation relation we studied was more/less in specialised texts. About 80% of the candidate terms relevant for STW. Our findings, basing on the few extracted were in one of the variation relations we classes examined here cannot be conclusive. studied. Thus, structuring variation relations is a way of Nevertheless, our summary examinations show quite representing the associations between domain concepts. logically that binary term variants in symmetrical In a similar yet different approach, Bouaud et al. relations (head and modifier substitution) often yield (1997) explored the conceptual organisation yielded by large heterogeneous classes whose topics are difficult to local syntactic dependencies amongst simple NPs. They

In 6th International Conference on Terminology and Knowledge engineering (TKE’02), Nancy 28-30 august 2002, 57-63. identify (classes 32, 33 and to some extent class 22). This is because substitution as we defined it, occurs much more easily in binary than in longer terms. To deal with the problem of large heterogeneous classes at the term clustering level, a solution may be to increase the threshold of external links at which connected components can be clustered, but we have to investigate the consequences carefully. Notwithstanding this handicap, binary substitution variants are not proven to be irrelevant for the task at hand, on the contrary. They initiated the links between several classes as shown in the 1st column of table 2 below. We sought to know if the chain formed by these links had any meaning for the domain specialist. In the case of binary head substitution (H-Sub) variants, the analyst concluded that the chain formed by binary variants highlighted the set of "concepts" that influenced a particular "object" or another domain concept, here "dough". Moreover, some conceptual interdependencies were observed amongst the chain formed by the binary term variants. For instance, "improver", "leavening", "acidity" and "temperature" are parameters which influence the final bread aspect, i.e. its "stability", "weakening", "structure" and "tolerance". Also, "dough handling" plays a role in the bread preparation. In the case of modifier substitution (M-Sub) variants (column two in table 2), the chain of relations highlighted the same "concept" family. This concept should be related to the grains (the flavour used) which will influence the texture of the bread. It would then seem, at first sight, that binary substitution variants mostly account for the closely-knit network of topics on the external level (Figure 7) since they initiated most of the external links. Binary H-Sub variants dough handling (class 13) dough stability (cl. 13) dough weakening (cl. 24) dough structure (cl. 19) dough improver (cl. 20) dough parameter (cl. 20) dough acidity (cl. 20) dough temperature (cl. 20) dough level (cl. 20) dough leavening (cl. 22) dough tolerance (cl. 22)

Binary M-Sub variants bread texture (cl. 19) endosperm texture (cl. 19) crumb texture (cl. 19) good texture (cl. 19) grain texture (cl. 19) harder texture (cl. 19) loaf texture (cl. 19) softer texture (cl. 19)

Table 2. Binary substitution variants This question of substitution variation relevancy becomes more interesting as we consider longer terms (>2 words). For instance, given the three-word substitution variants below (table 3), it was obvious for the analyst that they represented the same "property" family : "frozen dough" for the H-Sub variants and the same “concept” family, "sour bread" for the M-Sub variants (though it will be more cautious to observe more empirical results especially when the position of the substituted word changes).

Ternary H-Sub variants frozen dough baking (cl. 10) frozen dough characteristic (cl. 21) frozen dough method (cl. 21) frozen dough product (cl. 21)

Ternary M-Sub variants sour corn bread (cl. 21) sour dough bread (cl. 21) sour maize bread (cl. 21)

Table 3. Ternary substitution variants From our brief survey, it would seem also that the number of internal links alone cannot determine topic relevancy. There were many interconnected classes amongst the relevant, unidentifiable and uninteresting classes. Thus, determining a class's topic and its relevancy still lies within the scope of the domain expert because it needs more background knowledge, unavailable in our system. However, the clusters generated by our syntactic variations often depicted coherent associations between relevant domain topics which assist the expert in his STW task.

References Bouaud J., Habet B., Nazarenko A., Zweigenbaum P., 1997. Regroupements issus de dépendances syntaxiques sur un corpus de spécialité : catégorisation et confrontation à deux conceptualisations du domaine. Actes des 1ère journées d'Ingénierie des Connaissances, Roscoff, 207-223. Bourigault D., 1994. LEXTER, un Logiciel d'Extraction Terminologique. Application à l'acquisition des Connaissances à partir de textes. Ph.D. thesis, EHESS, Paris, 352p. Cabré M.T., 2000. Sur la représentation mentale des concepts : bases pour une tentative de modélisation. In Béjoint H. & Thoiron Ph. (eds), Le sens en terminologie. Presses universitaires de Lyon, 20-39. Daille B., 1994. Study and implementation of combined techniques for automatic extraction of terminology. Workshop of the 32nd Annual Meeting of the ACL, Las Cruces, New Mexico, USA, 9p. François C., Dubois C., 2001. Utilisation d'un système d'analyse de l'information dans le processus de veille scientifique et technique : pratiques collaboratives induites. 3rd Congress of the French chapter of ISKO, Paris, 5-6 july 2001, 79-87. Ibekwe-SanJuan. F., 1998a. Building a prototype system for trends survey through the use of term variants. 1st Workshop on Computational Terminology (Computerm’98), Montréal, 15 august 1998, 22-28. Ibekwe-SanJuan F., 1998b. A linguistic and mathematical method for mapping thematic trends from texts. 13th European Conference on Artificial Intelligence (ECAI’98), Brighton, UK, 23-28 august 1998, 170-174. Ibekwe-SanJuan F., 2001. Extraction terminologique avec INTEX. Journées INTEX, Bordeaux, 10-11 june 2001, 13p. In press.

In 6th International Conference on Terminology and Knowledge engineering (TKE’02), Nancy 28-30 august 2002, 57-63. Jacquemin C., Royauté J., 1994. Retrieving terms and their variants in a lexicalized unification-based framework. ACM-SIGIR 94, Dublin, july, 132-141. Katz S.M., Justeson T.S., 1995. Technical terminology : some linguistic properties and an algorithm for identification in text. Journal of Natural Language Engineering, 1(1), 19p. Silberztein M., 1993. INTEX© manual. ASSTRIL LADL, 201p. Slodzian M., 2000. L'émergence d'une terminologie textuelle et le retour du sens. In Béjoint H. & Thoiron Ph. (eds), Le sens en terminologie. Presses Universitaires de Lyon, 61-85.

Suggest Documents