Ontologizing Concept Maps using Graph Theory - School of Electrical

0 downloads 0 Views 119KB Size Report
Traditionally, in statistical and machine learning approaches, ontologizing ... Then Espresso's ..... [20] Salton, G. and Buckley, C. Term-weighting approaches in.
Ontologizing Concept Maps using Graph Theory Amal Zouaq

Dragan Gasevic

Marek Hatala

School of Interactive Arts and Technology,

School of Computing and Information School of Interactive Arts and TechSystems, Athabasca University nology, 1 University Drive, Athabasca, Simon Fraser University Surrey, Simon Fraser University Surrey, 13450 13450 102 Ave. Surrey, BC V3T 5X3, AB T9S 3A3, Canada 102 Ave. Surrey, BC V3T 5X3, Canada Canada

[email protected]

[email protected]

ABSTRACT Given the new challenges of open and unsupervised information extraction, there is a need to identify important and relevant knowledge structures (concepts and relationships) in the vast amount of extracted data and to filter the noise that results from unsupervised information extraction. This is generally referred to as the ontologization task. This paper uses measures from graph theory to identify these key elements such as Page Rank, Betweenness, and Degree. We also propose a combination of metrics for ranking concepts and relationships. Our approach shows effective results in terms of precision compared to other standard measures for weighting concepts and relationships such as TF*IDF or frequency of co-occurrences.

Categories and Subject Descriptors I. Computing Methodologies- I.2 ARTIFICIAL INTELLIGENCE I.2.7 Natural Language Processing-Text analysis E.1 DATA STRUCTURES - Graphs and networks, I.5 PATTERN RECOGNITION

General Terms Algorithms, Management, Experimentation.

Keywords Ontologization, concept and relation importance, precision, metrics, graph theory, ontology.

1. INTRODUCTION One of the most important issues for the development of the Semantic Web is the ability of creating conceptual models in the form of ontologies from textual data. In fact, the explosion of the amount of electronic data, in domain-dependent corpora and the Web, is a valuable knowledge source. However, for this source to be useful there is a need for more than sole information extraction. Recent efforts in the knowledge extraction community has encouraged the development of open information extraction [2, 8]; that is, a text mining software tool that does not rely on predePermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC’11, March 21-25, 2011, TaiChung, Taiwan. Copyright 2011 ACM 978-1-4503-0113-8/11/03…$10.00.

[email protected]

fined templates, but that is able to extract knowledge based on what it reads (unsupervised extraction). However, this kind of extraction generally suffers from the large number of proposed concepts, which hinders their usability from a user perspective [10], and results in noisy data that should be filtered [9, 13]. In fact, the “blinder” the approach is, the more likely it is to generate noisy knowledge. One way of achieving the filtering of the extracted information is through its ontologization [18]. In general, we can define the ontologization task as the ability to build a bridge between the natural language level and the abstract/conceptual level. Based on a set of terms and relational triples extracted from texts, the ontologization task tries to identify the most important textual relationships, assertions and terms in the vast amount of extracted data and often assigns a formal sense to these items (terms and relationships) using an available structure such as an ontology or a dictionary. This filtering of the important knowledge (henceforth called ontologization) is a crucial issue for large-scale ontology learning based on corpora and the Web. This paper contributes to the ontologization task and advocates the extraction of meaningful structures from the data without using any external semantic resource (dictionary, frames, taxonomy, ontology etc.). On the one hand, these external semantic resources are hard to build and maintain. On the other hand, they may not cover appropriately the domain of interest (e.g. WordNet, which is often used, is not a domain-specific lexicon). Following [27], we believe that meaning should “emerge” from texts and that finding relevant concepts and relationships should come out from the corpus itself. Starting from a set of concept maps extracted using our open ontology learning tool OntoCmaps, which is the successor of our previous work [26, 27], this paper exploits metrics from graph theory to rank concepts and relationships, which are extracted from texts and which form initial concept maps (Cmaps). Initial Cmaps are a representation of an entire textual corpus under study1. However, all the concepts and relationships in such Cmaps may not be needed for the final model (e.g., an ontology or a final concept map) to be developed. Thus, the ranking based on graph theory proposed in this paper should help filter the most re-

1

In fact, it is more appropriate to use terms or words rather than concepts as constitutive parts of those initial concept maps. Thus, in the rest of the paper, when we refer to initial concept maps, we will always assume that they consist of terms and relationships. Only those terms that are filtered by the proposed graph-based metrics are promoted into concepts.

levant concepts and relationships. In particular, in this paper we are interested in investigating the increased precision of the filtered concepts and relationships. To our knowledge, the use of graph theory has not been proposed until now in the ontologization task. Our assumption is that these metrics may provide some evidence on the relevance of concepts and relationships without recurring to any external knowledge source. Our idea is also to compare the results of graph theory metrics and combination of metrics to other standard weighting measures usually used in the ontology learning or in the knowledge extraction communities (Term frequency, TF*IDF, pointwise mutual information and frequency of co-occurrences). In this context, the objectives of this paper are: 1. To propose a filtering method based on metrics and combination of metrics (a voting scheme) to rank concepts and relationships and extract the most relevant ones; 2. To determine the metrics which are the most likely to give the most precise results; 3. To compare the results of each metric and the results of the voting scheme with those of standard weighting schemes (e.g., TF*IDF, point-wise mutual information, and frequency of cooccurrences); and 4. To assess the results of all the metrics using a human gold standard and a random baseline. This paper is organized as follows. After the introduction, Section 2 presents a set of related works and positions our proposal. Section 3 briefly presents the information extraction process, our hypotheses regarding concepts and relationships importance, as well as the metrics for ranking concepts and relations based on graph theory. Section 4 talks about our experiments and compares the results with other standard measures as well as with a human gold standard. Finally, section 5 summarizes the paper and discusses future work.

2. STATE OF THE ART The ranking and filtering issues are very important in the context of the information extraction and the ontology learning communities. Traditionally, in statistical and machine learning approaches, ontologizing concepts and relationships has been performed by linking them to a semantic repository [18] and by estimating the probability of the relationships [22] or concepts [3, 4] based on standard measures from information retrieval, such as TF*IDF [20]. For example, Text2Onto [4], a state-of-the-art ontology learning tool, proposes TF*IDF, among other metrics, to weight the extracted concepts. Clustering methods [18, 19] have also been used to find some categories in the data that are then considered as concepts or patterns. Knowledge-based approaches often rely on WordNet [15, 18] to annotate the data. For instance, Espresso [18] uses a clustering technique based on the WordNet taxonomy to create reliable extraction patterns. Then Espresso’s ontologization task consists of assigning a WordNet sense to terms linked by relationships that hold. There are also highly supervised machine learning approaches for concepts and relationships learning, which may lead to very accurate results, but which suffer from their dependence upon hand-labeled examples. One drawback of the aforementioned works is that they rely on a knowledge base or on clustering examples to address the ontolo-

gization issue. Due to the extensive effort required to build and maintain such knowledge structures, and due to the inadequacy of some of these structures (e.g., WordNet) to represent domain knowledge, it may be of great interest to provide a solution in an unsupervised manner. In this line of research, probabilities and weights have been proposed to filter important terms in ontology learning approaches [3, 4], but they generally rely on measures such as TF*IDF or frequency of co-occurrences rather than on the structural characteristics of graphs. Our motivation in using metrics from graph theory for identifying important concepts and relationships follows from a number of initiatives in the text mining community that show that these metrics might be beneficial for extracting important data in network structures. In fact, measures such as degree, eigenvector, betweenness and closeness centrality might be useful in many automatic extractions and filtering tasks such as important gene-disease associations discovery [17], noun phrase prediction [25], word sense disambiguation [16], topic identification [5], Actor-Concept-Instance networks ranking [14] and ontology analysis [1, 11, 26, 27]. To the best of our knowledge, there are very few, if any, attempts to exploit such graphbased metrics to filter the results of an information extraction system and to identify important concepts and relationships in graph structures. One proposal in this direction is our own work [26, 27] which uses the out-degree of a term in a concept map (the number of edges whose source is the term) as an indicator of the importance of the term. All the terms whose out-degree is greater than a given threshold are considered as concepts. However, there is no comparison, in that work, with other kinds of metrics such as Betweenness centrality or Page Rank to rate the effectiveness of each of these metrics for evaluating concept importance.

3. PROPOSITION 3.1 OntoCmaps In order to develop our idea, we used OntoCmaps, our open ontology learning tool and a successor of our previous system TEXCOMON [26, 27], to build the initial concept maps denoted with CMI = (TI, RI), which are composed of terms TI (as already indicated in footnote 1) and labeled relationships RI. OntoCmaps relies on syntactic patterns based on dependency grammars [6] to create CMI. These patterns are coupled with rules that create semantic representations from the syntactic structures. An example of a pattern is subject-verb-object, which can then be mapped to a predicate verb (subject, object) such as in the sentence ontologies describe domain knowledge: describe (ontologies, domain knowledge). OntoCmaps does not rely on any predefined template or domain-dependent information to extract its semantic representations. The knowledge extraction is performed on each key sentence of a document where a key sentence is one that contains keywords, identified through statistical measures such as TF*IDF. Syntactic patterns are then used to extract semantic representations in the form of triples that are associated to each key sentence. Once the whole corpus has been computed, an integration step is performed, by creating concept maps, CMI, around the extracted terms. A map around a term t ∈ TI contains the relationships whose subject is t. The number of the obtained concepts maps may differ depending on the corpus and the detected patterns. If the corpus is well-chosen, then it is likely that OntoCmaps will produce one big connected graph with possibly few disconnected components. Fig. 1 shows the proposed pipeline as

well as the tools used to support the discussed process (given on the right hand side of the figure). 1 Domain

Open Information

OntoCmaps

Extraction Tool

corpus

2 Initial Concept Maps (Cmaps) Cmaps Repository

Concept Map Ranking

Java Universal Network/ Graph Framework [12]

Ontology

Figure 1. Conceptual Architecture Once concept maps are obtained, there is typically a need to rank the elements (terms and relationships) of CMI in order to obtain a target model (e.g., ontology). This target model is denoted with CMF. The ranking step is in fact the primary focus of this paper. The input to this ranking step (shaded in Fig.1) are domain specific concepts map that can originate from two sources: (1) emerging from domain corpus (texts) using a tool such as OntoCmaps (i.e., CMI), or (2) existing domain concepts maps from available repositories. The output consists of the filtered concepts and relationships and is denoted with CMF = (CF, RF). Note that once an important term t∈ TI is filtered, it becomes a concept c ∈ CF in the final CMF generated by OntoCmaps, which is somehow a standard approach for ontology learning tools. With respect to relationships, we consider conceptual (object properties in terms of OWL) and hierarchical relationships (sub-classes in terms of OWL). Once the ranking is performed, the most relevant concepts and relationships can then be exported into an ontology, leading to a formal model of the domain. This model should be validated and updated by the domain expert prior to or after the export phase, depending on the expert preferences. The proposed architecture (Fig. 1) is independent of a particular tool or framework. For example, even if we used JUNG [12] for computing graph theory metrics, this API could be easily replaced by any other similar API. Similarly, based on the output of any knowledge extraction tool or using a manual process identifying terms and linking relationships in corpora, it is possible to create concept maps that could be further ranked using our approach. Of course, using a homogenous domain corpus may greatly reduce the difficulty of this concept map creation with automatic extraction tools, as it avoids dealing with difficult issues such as word sense disambiguation. Here, we assume that the corpus and the Cmaps repositories are about the same domain. It is also important to note that the final outcome ontology is not limited to CMF. In fact, our previous work [27] and current research also introduce rules for the creation of OWL ontologies, which not only consist of concepts and relationships, but also include some more advanced axioms (such as equivalent classes for instance). However, filtering terms and relationships is one big challenge of any ontology learning tool and that is why we focus on this contribution in this paper.

3.2 Hypotheses The approach presented in this paper is an experimental study that relies on a set of hypotheses to rank terms and relationships based on the structure of concept maps. To apply graph-based metrics, terms in concept maps are considered graph nodes, and relationships are edges. We propose a set of hypotheses that draw their roots mainly from the notion of centrality, which is essential in the analysis of networks. These hypotheses state that: 1. The importance of a concept may be a function of the number of relations that start from and end at the concept. This can be measured using the Degree of a node, which is computed based on the number of edges that are incident to that node (Hypothesis 1). 2. The importance of a concept may be a function of its centrality to the graph. This can be measured using Betweenness centrality. The betweenness centrality of a node A can be computed by the ratio of shortest paths through the node A connecting all pairs of other nodes in the graph (Hypothesis 2). 3. The importance of a concept may be a function of the number of important concepts that point to it. This can be measured using the Page Rank of a node, which is based on the number of links that point to the node, while taking into account the importance of the source nodes of these links (Hypothesis 3). 4. The importance of a relationship may be a function of its source and destination concepts. Here, important relationships are those which occur between two important concepts (Hypothesis 4). 5. The importance of a relationship may be a function of its centrality to the graph. Betweenness centrality can also be used to measure the centrality of a given edge (Hypothesis 5).

3.3 Metrics Based on our hypotheses, we computed three metrics Degree, Betweenness and Page Rank to assign weights to terms. In order to set up a threshold, we defined the following rule: A term must have a weight greater than or equal to the mean value of the current metric to be considered as a candidate term for promotion into a concept and inclusion in CMF. The idea behind the mean value is to retain only the nodes (i.e. terms) that are already quite important instead of experimentally defining thresholds that may change from one corpus to another. Using this mean threshold is relevant in this paper given that our goal is to identify the most precise metrics. However, it would be interesting to compare experimental thresholds with these mean values and to see whether mean values have any impact on the overall efficiency of the proposed metrics. In fact, considering the mean value as a threshold assumes that less than a half of the extracted terms are important, which might be too restrictive. We also created a voting scheme where a term is considered as an important term if it is a candidate term for all three metrics (Degree, Betweenness, and Page Rank): TVoted = TDegree ∩ TBetweenness ∩ TPageRank. In this voting scheme, the three metrics are equally important. Additionally, once important terms are computed using the voting scheme and included in the set of concepts of CFF, this set of con-

cepts is enriched by the following components: if an important term is part of a taxonomical link (as a child or as a parent) extracted using OntoCmaps, then its ancestor or descendant is added as a concept even if it was not selected by the voting scheme. The same rule applies for instance links. We apply these two additional rules to increase the number of important terms involved in taxonomical and instances relationships. In fact, these links are very important for building a conceptual model and reasoning. These two rules also have an impact on the rating of relationships by allowing, for instance, the selection of taxonomical links between important terms (see the first measure for rating relationship importance below). Besides these metrics from graph theory, we computed the term frequency and TF*IDF, two well-known metrics in information retrieval for comparison purposes. In addition to metrics for ranking important terms, we also defined a number of metrics to rate the importance of relationships as stated in hypotheses 4 and 5: 1. The first metric consists of selecting all the relationships that occur between important terms as important relationships; 2. The second metric ranks relationships based on betweenness centrality, where candidate relationships are chosen if their centrality is greater than or equal to the graph relationships mean centrality; These two measures are based on the graph structural characteristics and can be considered as evidences based on the corpus from which the graphs emerge. To provide a comparative basis, we also implemented standard measures for relationship ranking, namely frequency of co-occurrences and Point-Wise Mutual Information. Frequency of co-occurrences is a metric based on assigning frequencies of co-occurrence weights using the Dice coefficient, a standard measure for semantic relatedness. Similarly to the proposed graph-based metrics, this measure is based on the corpus, but it ranks relationships using the frequency of co-occurrence of the nodes involved in the relationships. In this case, the importance of a relation r between S (Source) and D (Destination) is calculated by using the following formula: Dice(S, D) = 2*F(S, D) / F(S) + F(D) Where F(S, D) is the number of co-occurrences of S and D in a given context (here, an extracted relationship), F(S) is the frequency of occurrence of S in the corpus, and F(D) is the frequency of occurrence of D in the corpus. Again, the selected relationships are those whose Dice coefficient is greater than or equal to the mean. Finally, we implemented another standard measure called Pointwise Mutual Information. This metric is usually used to measure the semantic relatedness between two terms. To our knowledge, in the domain of ontology learning, PMI has been used to extract synonyms [23] or to evaluate the probability of an extraction pattern [7, 18] but not to rate the probability of an extracted conceptual relationship as we propose here. In our experiments, we relied on the Measure of Semantic Relatedness Server [24] to calculate the PMI using the PMI-G metric (based on Google).

4. EVALUATION 4.1 Description of the Experiment The first goal of our experiments was to test our stated hypotheses and evaluate the effectiveness of the proposed graph-based metrics to improve the precision of the ranking results in the ontologization task. The second goal of study was to compare the performance of graph-based ranking metrics to those used in the state-of-the-art ontology learning research. Due to the well-known problem of evaluating the ontologization task, we decided to build our own corpus (30 000 words) and gold standard from a set of SCORM Manuals [21]. In order to help the domain expert in building the domain ontology gold standard, we ran the OntoCmaps tool and the Text2Onto tool [4] (a standard ontology learning tool) on the same corpus. The OntoCmaps Tool has the option of exporting all the extracted terms and relationships without additional filtering and we used it to produce a nonfiltered domain ontology D1. With Text2Onto, we generated an ontology (D2) using all the algorithms (for extracting concepts, instances, taxonomical links and relations) that did not require external resources such as Google and that relied only on the corpus itself. We automatically merged these two ontologies (D1 and D2) into a single one ontology without any other filtering. Then, we asked a domain expert to evaluate and prune the obtained ontology to obtain a satisfying gold standard representing the SCORM domain2. The objective here was to provide a fair comparison base for the two tools OntoCmaps and Text2Onto and to evaluate their results on the same corpus and gold standard. The next step was to re-run the OntoCmaps tools on the same corpus but this time by performing the automatic graph-based filtering of the results. These results were then compared against the domain expert ontology and rated according to two well-known measures of information retrieval: Precision = items the metric identified correctly / total number of items generated by the metric Recall = items the metric identified correctly / total number of relevant items (which the metric should have identified) In order to provide a comparative basis, we also computed standard metrics usually used for weighting terms and relationships in ontology learning tools as previously explained, namely term frequency, TF*IDF, PMI-G and frequency of co-occurrences, as well as a naïve random baseline, where terms and relationships were chosen randomly.

4.2 Results The results of our experiments are shown in the following tables (Table 1 and 2). While precision and recall are considered important in the information retrieval community, our focus here is on the precision, which is the ability to extract and filter correct items from a vast amount of noisy data. Our results indicate that using metrics from graph theory and especially our voting scheme may help identify concepts (> 81% precision) and conceptual relationships (> 60% precision). By comparison with standard measures such as TF*IDF, frequency of co-occurrences or PMI, graphbased metrics give better precision results. Despite the fact that the precision rates could be enhanced, we consider the obtained 2

Available at http://azouaq.athabascau.ca/Corpus/SCORM/SCORMCorpus.rar

precisions as reasonable and certainly an improvement considering that state-of-the-art metrics (TF*IDF, frequency of cooccurrences, etc.) obtained worse results on the same corpus. Table 1. Concept filtering results Precision Voting Scheme (VS) concept

81.96

Recall 38.93

Fmeasure 52.79

Betweenness concept

74.57

14.84

24.76

Page Rank concept

70.25

22.46

34.04

Degree concept

71.40

25.43

37.51

TF*IDF concept

44.50

16.12

23.67

TF concept

60.20

15.51

24.67

Random concept

47.36

21.18

29.27

Regarding concept filtering (Table 1), metrics from graph theory outperforms standard measures such as TF or TF*IDF, as well as the Random Baseline. In terms of precision and recall, the voting scheme metric is the most effective one. We can also notice that each single graph-based metric (Betweenness, PageRank and Degree) outperforms TF*IDF, TF and the random baseline (Hypotheses 1, 2 and 3) in terms of precision and F-measure. Surprisingly, the term frequency metric obtains better results than TF*IDF. This could be explained by the calculation of these metrics on already pre-filtered domain terms that are extracted by the OntoCmaps tool (rather than on the whole corpus as this is usually done). Table 2. Relationships (conceptual and hierarchical) filtering results Precision

Recall

Fmeasure

VS Relation (conceptual)

60.96

38.61

47.27

Bet. Rel. (conc.)

48.02

30.41

37.24

Co-occ. Rel. (conc.)

45.63

29.02

35.48

PMI-G Rel. (conc.)

41.34

52.08

46.09

Random Rel. (conc.)

36.41

27.91

31.60

VS Rel. (hierarchical)

80.00

38.33

51.83

Bet. Rel. (hier.)

62.50

1.48

2.90

Co-occ. Rel. (hier.)

68.00

1.26

2.47

PMI-G Rel. (hier.)

66.34

5.12

9.51

Random Rel. (hier.)

65.15

3.19

6.09

Regarding conceptual relations (Table 2), again graph-based metrics and more importantly our voting scheme outperform standard ones (PMI-G and Frequency of co-occurrences) and the Random Baseline. Our hypothesis 4 (selecting all the relationships that occur between important terms) is the best one in terms of precision and F-measure. Betweenness (hypothesis 5) slightly outperforms the baselines in terms of precision but not in a significant way. We can also notice that PMI-G has the highest recall,

making it an interesting (based on F-measure) metric for weighting conceptual relationships (after our voting scheme). Regarding hierarchical relationships, again our voting scheme gets the best precision, recall and F-measure result. The voting scheme recall benefits highly from the addition of concepts involved in taxonomical and instance links (see Section 3.3. for details). The random baseline has also a quite high precision and all the other metrics have a very low recall, which indicates that hierarchical relationships are generally extracted accurately. Thus, our conclusion is that graph-based filtering might not be beneficial for these hierarchical relations as most of them seem correctly extracted and hence do not require further filtering. In fact, further experiments without filtering hierarchical relationships gave us a recall over 75% and a precision over 66%, thus significantly increasing recall at the price of a lower precision (compared with 80% in the filtered voting scheme). Finally, our results indicate that filtering based on the mean threshold may be too restrictive leading to a low recall and further experiments should be done to increase recall. As a comparison, the ontology learning tool Text-2-Onto [4] obtained (on the same corpus) a Precision/Recall of only 31.71/25.16 for concept identification, 14.11/1.66 for conceptual relationships and 29.06/9.95 for hierarchical relationships. These results are far below those obtained with our tool OntoCmaps and our graph-based filtering metrics. Our work offers a novel research result justifying the interest of using graph-based metrics for concepts and relationships filtering especially if we compare the performance of the proposed metrics with the performance of standard measures on the same corpus (TF*IDF, PMI-G, frequency of co-occurrence) and with the performance of random baselines. Finally, we are currently planning experiments that involve more than one domain expert in the development of a larger gold standard, and a detailed analysis of the results.

5. CONCLUSION This paper presented an approach to ranking and filtering relevant terms and relationships in concept maps using metrics from graph theory. The novelty of the approach is that it relies on the inner structure of graphs to identify the important elements without using any other knowledge source. The other contribution is that we addressed the problem of filtering concepts and relationships with good precision. Our approach may not only be beneficial for automatic extraction tools but also for the analysis of concept maps repositories, as well as for the analysis of any graph-based representation of texts such as co-occurrence graphs. Our experiments showed that the voting scheme metric provides a good concept identification precision. The other finding was that important relationships are better identified with the metric relationships between important terms. In general, the best metrics were always based on graph theory and centrality measures, which provide some evidence to confirm our hypotheses. The approach was assessed using human evaluation and showed that a reasonable overlap might be obtained between our system’s results and the human-built gold standard. Our approach might be interesting for identifying conceptual structures in general and for ontology learning and evolution in particular. In fact, the main difficulty for ontology acquisition from text is the identification of important concepts and relation-

ships. The results presented in this paper suggest that using graph theory may be an interesting avenue to identify ontological classes and relationships (taxonomical links and properties with domain and range) with a higher degree of precision. This should help, for instance, users in building high quality ontologies using a semiautomatic process where an initial ontology design can be based on the investigated measures. To address the challenge, in future work, we plan to have user studies where such initial ontologies will be complemented with some novel metrics (based on empirically estimated recall values), which guide developers in the ontology refinement process. Also, our future work will tackle the exploration of various ways to increase the obtained recall as well as further experiments with human evaluators.

6. ACKNOWLEDGMENTS Amal Zouaq is partly funded by a postdoctoral fellowship from the FQRNT.

7. REFERENCES [1] Alani, H., Brewster, C. and Shadbolt, N. Ranking Ontologies with AKTiveRank. In the 5th Int. Semantic Web Conference (ISWC), pp. 5-9, Springer-Verlag, 2006. [2] Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M. and Etzioni, O. Open information extraction from the Web. In 20th Int. Joint Conf. on AI .pp. 2670-2676, ACM, 2007. [3] Buitelaar, P. and Cimiano, P. (Eds.). Ontology Learning and Population: Bridging the Gap between Text and Knowledge, IOS Press, 2008. [4] Cimiano, P. and Völker, J. Text2Onto. NLDB 2005, pp. 227238, Springer, 2005. [5] Coursey, K. and Mihalcea R. Topic Identification Using Wikipedia Graph Centrality, in Proc. of the North American Chapter of the Association for Computational Linguistics (NAACL 2009), pp.117-120, Colorado, 2009. [6] De Marneffe, M-C, MacCartney, B. and Manning. C.D. Generating Typed Dependency Parses from Phrase Structure Parses. In Proc. of LREC, pp. 449-454, ELRA, 2006. [7] Downey D, Etzioni O, Soderland S. and Weld D.S. Learning text patterns for Web information extraction and assessment. In Proc. of the American Association for Artificial Intelligence,Wshp on Adaptive Text Extraction and Mining, 2004. [8] Etzioni, O., Banko, M., Soderland, S., and Weld, D. S. Open information extraction from the web. Commun. 51(12): 6874, ACM, 2008. [9] Gordon, J., Van Durme, B. and Schubert, L. Learning from the Web: Extracting General World Knowledge from Noisy Text. Proc. AAAI 2010 Workshop on Collaboratively-built Knowledge Sources and AI, 2010. [10] Hatala, M., Gašević, D., Siadaty, M., Jovanović, J., and Torniai, C. Can Educators Develop Ontologies Using Ontology Extraction Tools: an End User Study, In Proc. 4th Euro. Conf. Technology-enhanced Learning, pp. 140-153, 2009. [11] Hoser, B., Hotho, A., Jaschke, R., Schmitz, C. and Stumme,

G. Semantic network analysis of ontologies. The Semantic Web: Research and Applications, volume 4011 of LNAI, pages 514–529, Heidelberg, June 2006. [12] JUNG. Last retrieved on October 25, 2010 from http://jung.sourceforge.net/ [13] Lin, T., Etzioni, O., and Fogarty, J. Identifying interesting assertions from the web. In Proc. of the 18th Conf. on information and Knowledge Mgnt, pp. 1787-1790, ACM, 2009. [14] Mika, P. Ontologies Are Us: A Unified Model of Social Networks and Semantics. Proc. of International Semantic Web Conference, pp. 522-536, Springer, 2005. [15] Navigli, R. and Velardi, P. Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites. Computational Linguistics 30(2): 151-179, 2004. [16] Navigli, R. and Lapata, M. An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation. IEEE Trans. Pattern Anal. 32(4): 678-692, 2010. [17] Özgür, A., Vu T., Erkan, G. and Radev, D. R. Identifying gene-disease associations using centrality on a literature mined gene-interaction network, Bioinformatics 24(13):277285, 2008. [18] Pantel, P. and Pennacchiotti, M. Automatically Harvesting and Ontologizing Semantic Relations. Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp. 171-198, IOS Press, 2008. [19] Pantel, P. and Lin, D. Discovering Word Senses from Text. In Proc. of SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 613-619. ACM, 2002. [20] Salton, G. and Buckley, C. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):515–523, 1988. [21] SCORM. Last retrieved on October 25, 2010 from http://www.adlnet.gov/Technologies/scorm/SCORMSDocum ents/SCORM%20Resources [22] Soderland, S. and Mandhani, B. Moving from Textual Relations to Ontologized Relations, Proc. of the AAAI Spring Symposium on Machine Reading, pp. 85-90 AAAI Press, 2007. [23] Turney, P. D. Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In Proc. of the 12th European Conference on Machine Learning (ECML), pp. 491–502, 2001. [24] Veksler, V. D., Grintsvayg, A., Lindsey, R., and Gray, W. D. A proxy for all your semantic needs. Proc. of the 29th Annual Meeting of the Cognitive Science Society, 2007. [25] Xie, Z. Centrality measures in text mining: prediction of noun phrases that appear in abstracts. In Proc. of the ACL Student Research Workshop, pp. 103-108, ACL, 2005. [26] Zouaq, A. and Nkambou, R. Evaluating the Generation of Domain Ontologies in the Knowledge Puzzle Project. IEEE Trans. on Kdge and Data Eng., 21(11): 1559-1572, 2009. [27] Zouaq, A. An Ontological Engineering Approach for the Acquisition and Exploitation of Knowledge in Texts, PhD Thesis, University of Montreal (in French), 2008.