Lightweight domain ontology learning from texts: graph theory-based ...

1 downloads 5987 Views 439KB Size Report
Official Full-Text Publication: Lightweight domain ontology learning from texts: graph theory-based approach using Wikipedia. on ... The availability of prior knowledge (creating .... and to check the way we should use the JWPL (Java.
Int. J. Metadata, Semantics and Ontologies, Vol. , No. ,

1

Lightweight Domain Ontology Learning from Texts : Graph Theory Based Approach using Wikipedia Abstract: Ontology Engineering is the backbone of the Semantic Web. However, the construction of formal ontologies is a tough exercise which requires time and heavy costs. Ontology Learning is thus a solution for this requirement. Since texts are massively available everywhere, making up of experts’ knowledge and their know-how, it is of great value to capture the knowledge existing within such texts. Our approach is thus the kind of research work that answers the challenge of creating concepts’ hierarchies from textual data taking advantage of the Wikipedia encyclopedia to achieve some good quality results. This paper presents a novel approach which essentially uses plain text Wikipedia instead of its categorical system and works with a simplified algorithm to infer a domain taxonomy from a graph. Keywords: domain ontologies, ontology learning from texts, concepts’ hierarchy, graph normalization, Wikipedia.

1 Introduction Ontology engineering is a hard task which is timeconsuming and quite expensive. The difficulty of designing manually such means stems mainly from the knowledge acquisition bottleneck; a commonly known issue. This problem is due to the fact that the needed knowledge is related to human know-how and the most important share of it is retained in their brains. It should be noted that the main application of such a modern designed artifact, i.e. the ontology, is the semantic Web [1]. Furthermore, the success of it depends on the proliferation of ontologies, which requires speed and simplicity in engineering them. The need for (semi) automatic domain ontological extraction has thus been rapidly felt by the research world. Ontology learning is then the research realm referred to. Concepts’ hierarchy building stands in the heart of ontology learning process. The construction of such structures is recently being supported by graph theory [2]. In fact, to support reasoning with a deterministic search-space, hierarchies with single-inheritance are preferable. This will be a central objective that our approach tries to reach. In fact, Wikipedia is recently showing a new potential as a lexical semantic resource [3]. When this collaboratively constructed resource is used to compute sematic relatedness [4, 5] using its categories’ system, this same system is also used to derive large scale taxonomies [6] or even to achieve knowledge acquisition [7]. Our approach capitalizes on the well organized Wikipedia articles to retrieve the most useful information at all, namely the definition of a concept. More precisely, the approach we present is somewhat novel as it, first, uses the plain text Wikipedia instead

of its categories’ system as in the most of the existing approaches and, second, that it is based on graph normalization. The reminder of the paper is organized as follows: Section 2 presents the ontology learning realm. Its layer cake structure will be then briefly depicted in Section 3. Section 4 discusses the exact formulation of the problem as well as related previous works. In Section 5, we explain our approach’s architecture and give some implementation details. This will be followed, in Section 6, by giving the results of our approach’s evaluation . Finally, conclusions and summary of possible research prospects are placed in Section 7.

2 Ontology Learning Ontology learning is the term given for the automatic or semi-automatic support for the construction of an ontology. This field has indeed the potential to reduce the cost of creating an ontology. For this reason, a plethora of ontology learning techniques have been adopted. Since the full automation of these techniques remains in the distant future, the process of ontology learning is argued to be semi-automatic with an insistent need for human intervention. According to [8], four characteristics permit to distinguish ontology learning techniques. These parameters are:

1. Automation (semi-automatic, fully automatic)

2. Ontological knowledge to extract (concepts, taxonomy, non-taxonomic relationships, instances

2

author or axioms) 3. The availability of prior knowledge (creating ontologies from scratch and/or updating existing ontologies) [9] 4. The input knowledge source: this data may be structured which refers to already knowledge models (database schemas or existing ontologies), semistructured data designating a mixture of structured data with free text (Web pages, Wikipedia, XML documents, etc.) or unstructured relating to natural language text documents (Pdf documents, the majority of HTML based webpages, etc.)

Most of the knowledge available on the Web represents natural language texts. This is the reason why this paper focuses especially on ontology learning from texts.

The third step is to determine which of the existing terms, those who are concepts. According to [10], a term can represent a concept if we can define: its intention (giving the definition, formal or otherwise, that encompasses all objects the concept describes), its extension (all the objects or instances of the given concept) and to report its lexical realizations (a set of synonyms in different languages). The extraction of concepts hierarchies, our key concern, is to find the relationship ‘is-a’, i.e. classes and subclasses or hyperonyms. This phase is followed by the non-taxonomic relations’ extraction which consists on seeking for any relationship that does not fit in a previously described taxonomic framework. The extraction of axioms is the final level of the learning process and it is argued to be the most difficult one. To date, few projects have attacked the discovery of axioms and rules from text.

4 Related Work 3 Ontology Learning Layer Cake The process of extracting a domain ontology can be decomposed into a set of steps, summarized by [10] and commonly known as ‘ontology learning layer cake’. Figure 1 illustrates these steps.

Figure 1

Ontology learning layer cake (adapted from [10])

The first step of the ontology learning process is to extract the terms that are of great importance to describe a domain. A term is a basic semantic unit which can be simple or complex. Next, synonyms among the previous set of terms should be extracted. This allows associate different words with the same concept whether in one language or in different languages. These two layers are called the lexical layers of the ontology learning cake.

There is a large body of work concerned with lightweight ontology learning from texts. In this paper,we particularly focus on taxonomies’ construction from texts by relying on a wide coverage online encyclopedia, namely Wikipedia. Few researchers tackled this issue from a similar angle of vision. Among the most promising works in the literature that have advanced research on this field is the one of Ponzetto and Strube [4]. They use the category system in Wikipedia as a conceptual network. Large scale taxonomies are derived using methods based on connectivity in the network and also lexico-syntactic patterns in order to distinguish between is-a and not-is-a semantic links. The two researchers evaluate their resulting taxonomies by comparing them with manually annotated ontologies as well as computing semantic similarity between words in commonly used benchmarking datasets. Kozareva and Hovy [11] propose a semi-supervised algorithm that learns automatically from the Web hyponym-hypernym pairs subordinated to the root concept and a Web based concept positioning procedure to validate the learned pairs is-a relations. In addition, their system learns graphically from scratch the integrated taxonomy structure of all the terms. The resulting taxonomies are compared with parts of the WordNet taxonomy. The proposed algorithm finds additional concepts that do not exist in Wordnet but misses some concepts and links in this lexical database. More recently, an approach was proposed by Foutain and Lapata [12] who give an algorithm for inducing lexical taxonomies automatically from texts combining Monte Carlo Sampling algorithm and a maximum likelihood approach. They infer a hierarchy of taxonomic

3 terms from a graph they construct taking into account distributional similarity of the corpus’ terms in question. The given approach skips the terms’ discovering step since the authors assume they are given the terms for which they construct a taxonomy. An algorithm which is closer to our research work is given by Navigli et al. [2]. In their paper, the authors present an approach aimed at learning taxonomies from a domain corpus using the Web. In an automatic way, the algorithm learns both concepts and relations after extracting the domain terms, definitions and hypernyms. The authors propose an algorithm to induce a taxonomy from the resulting graph. The step of looking for the concepts’ definitions is based on the Web and is using the authors’ Word-Class Lattices algorithm [13]. Aiming to find a solution which constructs lightweight domain ontologies and at the same time avoids facing the noise and scale of Web data, we opt for the collaboratively maintained resource which is Wikipedia. Novelty in our approach stems from the idea of using plain text articles instead of exploiting the categorical structure of Wikipedia. The latter had hitherto been quite used. In addition, the way we are lopping our graph in order to build the concepts’ hierarchy is conceptually quite simple comparing with other methods.

5 Concepts’ Hierarchy Building Approach Our approach tackles primarily the construction of concepts’ hierarchies from text documents. We will make a terminology extraction using a dedicated tool for this task which is TermoStat [14]. The initial terms will be the subjects of a definitions’ investigation within Wikipedia. Using the lexico-syntactic patterns defined by [15] to our case, the hyperonyms of our terms will be learned. At this level, we can get the hyperonyms directly from ontologies that are built on top of Wikipedia (e.g. Y agoa , DBpediab ), instead of resorting to the raw text of the articles. However, we need to extract the definition of each concept from the Wikipedia article in order to make the experts’ interventions easy. Therefore, to increase the efficiency of the developed application, a definition component is added into the tool’s GUI. The use of plain texts Wikipedia was also an opportunity to test some Java codes we have developed and to check the way we should use the JWPL (Java Wikipedia Library). This free, Java-based application programming interface provides structured access to all information in Wikipedia including its articles as well as categories, redirects and link structure. The described process is iterative which comes to its end when an in

advance predefined maximum number of iterations is reached. In parallel, our algorithm generates a graph which is by definition a representation of a set of objects or items where links are connecting some pairs of these items. In other words, it is an ordered pair G = (N, A) comprising a set N of nodes or vertices together with a set A of arcs, edges or lines. The nodes of our graph consist of the domain terms and their hyperonyms. Each arc represents the subsumption relationship which links two concepts. Unfortunately, such a graph may contain cycles and its nodes may have more then one hyperonym. The hierarchy we promise to build is the transformation result of the graph to a forest focusing on the hierarchic structure of a taxonomy. Figure 2 gives the overall idea of the proposed approach.

5.1 Preliminary Steps In order to carry out our approach, we should first undergo the two lexical ontology learning’s layers. The tool we used for the sake of retrieving the domain terminology is T ermoStatc . This web application was favored for determined reasons. After using T ermExtractord and TermoStat in order to collect the domain terms, the second tool gives us best results. This assumption was confirmed by several domain experts as they analyzed the set of extracted domain’s terms. In fact, the way TermoStat deals with the input corpus, when juxtaposing it to a generalized corpus such as BNC (British National Corpus), gives a list of domain terms that has a high degree of specialization. In other words, the selected items are quite relevant for the chosen domain and are the less general. Afterwards, we try to find out the synonyms among the list of candidate terms. The use of thesaurus.com permits us to find synonyms, manually sought, among our domain terms. This task is not as hard as it seems since finding synonyms among specialized terms is more evident compared to general vocabulary. The third layer can be skipped in our context; concepts’ hierarchies construction does not depend on the concepts’ definitions. In other words, our algorithm needs mainly the candidate terms elected to be representative for the set of its synonyms (synset). The set of initial candidate terms is named C0 .

5.2 Concepts’ Hierarchy The approach we are proposing belongs to two research paradigms, namely concepts’ hierarchies construction for ontology learning and secondly the use of Wikipedia for knowledge extraction. The achievement of our solution

a http://www.mpi-inf.mpg.de/yago-naga/yago/

c http://olst.ling.umontreal.ca/

b http://dbpedia.org/About

d http://lcl.uniroma1.it/sso/index.jsp?returnURL=%2Ftermextractor%2F

drouinp/termostat web/

4

author

Figure 2

Steps of the proposed approach

relies heavily on concepts from graphs’ theory. a. Hyperonyms’ Learning using Wikipedia At the beginning of our algorithm, we have the following input data: • G = (N, A) is an oriented graph such as N is the set of nodes and A is the set of arcs, N = C0 and A = ∅. Our objective is to extend the initial graph with new nodes and arcs; the former are the hyperonyms and the later are the subsumption links. The extension of Ci , i is the iteration index, is done by using the concepts’ definitions extracted from Wikipedia. • Cgen is a set of general concepts for which we will not look for hyperonyms. These elements are defined by the domain experts including for example object, element, human being, etc. S1 For each cj ∈ Ci , we check if cj ∈ Cgen . If it is the case, this concept will be skipped. Else, we look for its definition in Wikipedia. The definition of a given term is always the first sentence of the paragraph before the TOC of the corresponding article. Three cases may occur: 1. The term exists in Wikipedia and its article is accessible. Then we pass to the

following step. 2. The concept is so ambiguous that our inquiry leads to the Wikipedia disambiguation page. For instance, if we are interested in the domain of HSE measures, the term ’headgear’ stands for any ’protective clothing’ with which employees should cover their heads. However, as far as the domain of dentistry is concerned, this term should also be linked to the article titled “Orthodontic headgear” which describes an appliance widely used in dentistry. Notwithstanding, it is worth to mention that in this kind of case, the search will surely lead to a disambiguation page for this term. In this situation, an additional work should be done to disambiguate the concept and find the relevant page. This issue is not our current concern. 3. Finally, the word for which we seek a hyperonym does not exist in the database of Wikipedia. This occurs rarely and since Wikipedia is our only resource so we skip the element. S2 For the definition of the given concept, we apply the principle of Hearst’s patterns. We attempt to collect exhaustive listing of the key expressions we need. For instance, the definition may contain: is a, refers to, is a form of, consists of, etc. This procedure

5

Figure 3

Wells’ drilling HSE graph

permits us to retrieve the hyperonym of the concept cj . The new set of concepts is the input data for the following iteration. S3 Add into the graph G the nodes corresponding to the hyperonyms and the arcs that link these nodes. b. From Graph to Forest The main idea which shapes the following stage shares a lot with [2]. In fact, the graph which results from the preceding step has two imperfections. The first one is that many concepts are connected to more then one hyperonym. In addition, the structure of the resulting graph is patently cyclic which does not concord with the definition of a hierarchy. An adequate treatment is paramount in order to clean up the graph from circuits as well as multiple subsumption links. Thus, we will obtain, at the end, a forest respecting the structure of a hierarchy.

Figure 4

Weighted graph

1. Weigh the arcs so as to foster long roads within the graph. We will increment the value assigned to the arc the more we go in depth (Figure 4). 2. We adapt the Kruskal’s algorithm[1956] which creates a maximal covering forest from a graph (Figure 5). When we have to remove edges, we always save the one which has the highest weight in order to save long paths that hold more details. If significant arcs are removed, we will reset them manually at the end of the algorithm.

An example of how our algorithm will treat a given a graph to obtain a hierarchy is needed. Figure 3 is a piece taken from the whole graph that we obtained during the evaluation of our approach. It represents a part of drilling wells’ HSE namely the PPE ( Personal Protective Equipment). We have changed a bit the sub-graph so as it contains all the needed cases for explanation. The green rectangles are the initial candidate concepts. Figure 5

The resolution of the first raised imperfection implies obviously the resolution of the second one. Therefore, we will use the following solution:

Final forest

Finally we have reached the aim we have planned.

6

author

6 Evaluation

arcs.

The task of manually evaluating the correctness of concepts’ taxonomies is extremely hard even for the experts of a given domain. According to the literature, ontology evaluation approaches are classified according to the need they would satisfy. First, approaches that assess the “quality” of an ontology defining the criteria that should verify a “good ontology”. Second, approaches based on a comparison with a benchmark ontology or other ontologies. Finally, the approaches based on the “utility” of an ontology in the context of an application.

To react to the previous step, and with the help of the domain experts, we have done a manual evaluation of the resulting concepts’ hierarchy in order to analyze its semantic quality. We calculate the frequently used metrics which are the precision and the recall. The former measures the number of relevant terms extracted divided by the total number of terms extracted. The latter is the number of relevant terms extracted divided by the total number of relevant terms in the benchmark. The formulas of the two measures are given hereafter :

Tex2Tax is the prototype we have developed using Java in order to evaluate our approach. Jsoupe is the API which allows us to access online Wikipedia. The same result is reached if using JW P Lf with the encyclopedia’s dumpg . JU N Gh is the API we have used for the management of our graphs. Figure 6 is the GUI of our prototype. We deal with our research work from two angles of vision. First, we examine the extent to which the taxonomies we build retain the domains’ experts knowledge. In this case, we refer to drilling wells HSE domain. Second, we assess the hierarchy structure with reference to a gold standard. This additional evaluation is made using partial Wordnet taxonomies and is applied to animals, plants and vehicles specific domains.

6.1 Experiment 1: Manual Evaluation Firstly, our evaluation corpus is a set of texts that are collected in the Algerian/British/Norwegian joint venture Sonatrach / British Petroleum / Statoil. This specialized corpus deals with the field of wells’ drilling HSE . Throughout our experiment, interventions from the experts are inevitable. We worked with ten persons from the HSE department’ staff in order to reach the required consensus for evaluating our taxonomy. The terminology extraction phase and the synonyms retrieving have given a collection of 259 domain concepts. The final graph is formed by 516 nodes and 893 arcs. After having done the cleaning, the concepts’ forest holds 323 nodes, among them 211 are initial candidate terms. The number of remaining arcs is of 322. In order to study the taxonomy structure we calculate the compression ratio for the nodes which is 0.63(323/516) and the one of the arcs which equals to 0.36(322/893). These two values show that after lopping our graph, nodes were deleted which implies the suppression of e http://jsoup.org/ f http://code.google.com/p/jwpl/ g http://en.wikipedia.org/wiki/Wikipedia:Database h http://jung.sourceforge.net/

download

LP = LR =

erelevant eall erelevant brelevant

The total number of relevant terms in the benchmark is also given by the experts. This value was calculated using a domain encyclopedia and it refers to the number of its entries (566). The resulting values are respectively : LP = 0.65(211/323) LR = 0.37(211/566) The recall of our taxonomy is relatively low. This phenomenon is mainly due to the fact that the encyclopedia we used as a benchmark contains a considerable amount of domain terms that are obsolete and do not exist at all within the corpus we used. In addition, this value is also due to terms that do not exist in the database of Wikipedia. The graph’s lopping is also responsible of some loss of nodes containing appropriate domain vocabulary.

6.2 Experiment 2: Evaluation against a Gold Standard A second way to evaluate our approach is to build a taxonomy of a given domain of interest and compare it with a partial hierarchy of Wordnet. This treatment is quite used in recent research works [2, 11] and it is evident to choose the same domains that others deal with to be able to compare the results. The specific domains to which we apply our taxonomy extraction algorithm are animals, plants and vehicles. Unlike other researchers who use also the same set of terms, we perform, in the first stage of this experiment, the first step of our approach in order to extract the terminology of the selected domain which is vehicles. We should just mention that the corpus was given by the same joint venture since one of its departments manage the organization’s means of transport and has numerous documents for the field of vehicles. After that, we use the terminology provided by [11] for animals, plants and vehicles to make a strong analysis. In each case of the two previously described stages, a comparative analysis

7

Figure 6

Table 1

Tex2Tax prototype’s GUI

New nodes and arcs, precision and recall : a comparison.

Metric New concepts New links Nodes CR∗ Edges CR∗ Recall Precision

Tex2Tax 97 53 0.39=123/313 0.31=122/388 51.5 92.2

K and H 2010

N et al. 2011

60 99

48.7 90.9

will be given to assess the quality of our approach. We give in Table 1. the results of our experiment with the Wordnet sub-hierarchy for vehicles. We selected the research works that were given as examples in the previous paragraph to compare our results with them since they present the higher recall and precision values hitherto. The results include new concepts that do not exist in Wordnet, new is a relations, graph compression ratios, the taxonomy’s recall and precision. Statistics concerning the new discovered nodes and edges that did not exist in Wordnet show that the use of Wikipedia for hyperonyms search is a quite judicious choice. In fact, the initial set of Vehicles’ specific terms that were extracted from the textual corpus contains 113 items. The real enrichment of this set is due to the use of Wikipedia articles. In addition, the amount of new is a links in the resulting taxonomy is quite good ∗ : compession ratio.

comparing with the partial Wordnet hierarchy. However, despite the fact that our recall and precision exceed slightly the ones of Navigli et al., they are significantly lower than Kozareva and Hovy’s ones. This is due to the initial set of terms which is not the same one used in the other two works. In fact, the corpus used in order to extract initial domain terminology is not as rich as it is needed to obtain a comparable number of terminology items. This set of domain terms is different in terms of content and cardinality which is smaller than the one used by the other researchers. The quality of our taxonomy is then similar to theirs since we used a reduced number of initial terms and only Wikipedia and not the Web. The recall is evidently lower because the Web is of a much better quantity than Wikipedia in terms of the resources it contains. As a matter of fact, our final taxonomy richness depends a lot on the initial set of candidate terms. Extending this collection of terms from the beginning improves our final taxonomy scale. This is what we do in the second part of our experiment when using the same collection of initial domain terms. Table 2 gives precision and recall values we obtained for animals,plants and vehicles domains. The results obtained by using the same set of terms as the other approaches for the vehicles sub-hierarchy are better. This indicates that the only reason for the low precision and recall values previously obtained is the insufficient number of domain terms contained in the chosen corpus. The recall is still lower than the other works for the use of Wikipedia instead of the Web. We clearly notice that now the precision is much better then

8

author

Table 2

Animals Plants Vehicles

Precision and recall compared with [11] and [2]

Tex2Tax Prec., Recall 97.5, 36.0 97.2, 35.1 94.3, 51.0

K and H 2010 Prec., Recall 97.3, 38.0 97.2, 39.4 99, 60.0

N et al. 2011 Prec., Recall 97.0, 43.7 97.0, 38.3 90.9, 48.7

the one resulted from the other experiments. The use of the same term set as the other approaches reveals that our approach outperforms web-based approaches in precision. This is mainly due to the fact that the Web has much noise than Wikipedia. The vehicles’ taxonomy makes an exception for the previous revelations. The exigency of our algorithm to obtain a taxonomy instead of a graph for [11] makes the precision lower since the vehicles’ graph is the one in which we loose more relevant concepts when loping it. Overall, the obtained results are very encouraging given the fact that we used Wikipedia instead of the Web and that we start our algorithm from scratch.

7 Conclusion Despite all the work which is done in the field of ontology learning, a lot of cooperation, many contributions and resources are needed to be able to really automate this process. Our approach is one of those few works that harness the collaboratively constructed resource namely Wikipedia in order to build a domain taxonomy. We have described the way we use plain text articles to learn hyperonyms. The results achieved and which are based on the exploitation of the idea of Hearst’s lexico-syntactic patterns and the graphs’ pruning is seen to be promising. We intend to improve our work by addressing other issues such as enriching the research base by the Web, exploiting the categorical system of Wikipedia in order to attack higher levels of the ontology leaning process such as non-taxonomic relations. A further work suggestion we would make would be to try to resolve the disambiguation problem we face when we access disambiguation pages of Wikipedia. Finally, multi-lingual ontology learning is, in addition, an alive research area which is just timidly evoked.

Acknowledgment We are thankful to the Sonatrach / British Petroleum / Statoil joint venture’s President and its Business Support Manager for giving us the approval to access the wells’ drilling HSE corpus.

References [1] A. Johannes Pretorius. ”Ontologies - Introduction and Overview”. Adapted from: PRETORIUS, A.J., Lexon Visualisation: Visualising Binary Fact Types in Ontology Bases, Chapter 2, Unpublished MSc Thesis, Brussels, Vrije Universiteit Brussel, 2004. [2] Navigli, R. Velardi, P. Faralli. S. (2011): A Graphbased Algorithm for Inducing Lexical Taxonomies from Scratch. Proc. of the 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011), Barcelona, Spain, July 19-22nd, , pp. 1872-1877. [3] Zesch, T., M¨ uller, C., and Gurevych, I. (2008a): Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary. Paper presented at the Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech, Morocco. [4] Ponzetto, S. P. and Strube, M. (2007): Knowledge derived from Wikipedia for computing semantic relatedness. Journal of Artificial Intelligence Research,, IJCAII-JAIR 2010, pp. 30, 181-212. [5] Strube, M. and Ponzetto, S. P. (2006): WikiRelate! Computing semantic relatedness using Wikipedia. Proceedings of the 21st National Conference on Artificial Intelligence, Boston, Mass., 16-20 July 2006, pp. 1419-1424. [6] Ponzetto, S. P. and Strube, M. (2007): Deriving a large scale taxonomy from Wikipedia. Proceedings of the 22nd Conference on the Advancement of Artificial Intelligence, Vancouver, B.C., Canada, 2226 July 2007, pp. 1440-1445. [7] Nastase V. et Strube M.. Decoding Wikipedia Categories for Knowledge Acquisition. AAAI ’08, 2008. [8] Zouaq , A. and Nkambou, R. (2010): A Survey of Domain Ontology Engineering: Methods and Tools Advances in Intelligent Tutoring Systems, 2010, pp.103-119. [9] Benz, D., (2007), Collaborative ontology learning. Masters thesis, University of Freiburg [10] Buitelaar, P. Cimiano, P. Magnini, B. (2005): Ontology Learning from Text: Methods, Evaluation and Applications, IOS Press, Frontiers in Artificial Intelligence and Applications, Vol. 123, Juli, 2005. [11] Kozareva, Z. and Hovy, E. (2010): A SemiSupervised Method to Learn and Construct Taxonomies using the Web, EMNLP’10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing.

9 [12] Fountain F.[J].Lapata M. (2012): Taxonomy induction using hierarchical random graphs [C]. 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montre’al, Canada, June 3-.2012 [13] R. Navigli, P. Velardi. (2010): Learning WordClass Lattices for Definition and Hypernym Extraction. Proc. of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), Uppsala, Sweden, July 11-16, 2010, pp. 13181327. [14] Drouin, P. (2002): Acquisition automatique des termes : l’utilisation des pivots lexicaux specialises, thse de doctorat, Montral : Universit de Montral, 2002. [15] Hearst. H. (1992): Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International Conference on Computational Linguistics, pp. 539-545.

Suggest Documents