Relating Ontologies with a Fuzzy Information Model

3 downloads 0 Views 317KB Size Report
Maria Angelica A. Leite · Ivan L. M. Ricarte. Received: date ...... Gomez-Pérez, A., Fernández-Lopez, M., Corcho, O.: Ontological Engineering. Springer-Verlag ...
Knowledge and Informations Systems manuscript No. (will be inserted by the editor)

Relating Ontologies with a Fuzzy Information Model Maria Angelica A. Leite · Ivan L. M. Ricarte

Received: date / Accepted: date

Abstract More people than ever before have access to information with the World Wide Web; information volume and number of users both continue to expand. Traditional search methods based on keywords are not effective, resulting in large lists of documents, many of which unrelated to users’ needs. One way to improve information retrieval is to associate meaning to users’ queries by using ontologies, knowledge bases that encode a set of concepts about one domain and their relationships. Encoding a knowledge base using one single ontology is usual, but a document collection can deal with different domains, each organized into an ontology. This work presents a novel way to represent and organize knowledge, from distinct domains, using multiple ontologies that can be related. The model allows the ontologies, as well as the relationships between concepts from distinct ontologies, to be represented independently. Additionally, fuzzy set theory techniques are employed to deal with knowledge subjectivity and uncertainty. This approach to organize knowledge and an associated query expansion method are integrated into a fuzzy model for information retrieval based on multi-related ontologies. The performance of a search engine using this model is compared with another fuzzy-based approach for information retrieval, and with the Apache Lucene search engine. Experimental results show that this model improves precision and recall measures. Keywords Knowledge Organization · Fuzzy Information Retrieval · Query Expansion · Ontology Maria Angelica A. Leite Embrapa Agriculture Informatics PO Box: 6041 - ZIP: 13083-970 - Campinas - SP - Brazil Tel.: +55-19-3211-5700 Fax: +55-19-3211-5754 E-mail: [email protected] Ivan L. M. Ricarte School of Electrical and Computer Engineering, University of Campinas PO Box 6101, Postal Code: 13083-970 - Campinas, SP, Brazil Tel.: +55-19-3521-3771 Fax: +55-19-3521-3845 E-mail: [email protected]

2

Maria Angelica A. Leite, Ivan L. M. Ricarte

1 Introduction With the growing availability of information, much research has been done to provide intelligent ways to improve information access. To accomplish this, many information retrieval models and systems have been proposed. An information retrieval system stores and indexes documents in such a way that, when users express their information needs in a query, the system retrieves the related documents assigning a relevance score to each one [2]. Usually, documents are retrieved when they contain terms or keywords specified in the user’s query. However, many other documents may contain the desired semantic information, even though they do not contain the user-specified keywords, in which case the traditional approach leaves out relevant documents. One way to deal with this limitation is to consider not only the lexical information explicit in documents but also its semantics, i.e., the meaning attached to it. When working with a specific domain of knowledge, this problem of not taking semantics into account can be overcome by incorporating a knowledge base which depicts term relationships into the existing information retrieval systems. Knowledge bases can be manually developed by domain experts, or automatically constructed from knowledge extracted from the document collection [18,37,55]. Furthermore, they can be constructed based on users’ knowledge, such as folksonomies [45], and can be represented by conceptual structures [49,50]. In this context, the usage of ontologies to organize knowledge and express semantic meaning has been gaining attention. A recent approach is the use of ontologies to expand queries, i.e., to infer new terms to be added to queries [3]. The main aim of query expansion is to add new meaningful terms to the initial query. Usually information retrieval systems use a single conceptual structure to model knowledge and compose the knowledge base. However, knowledge indexing a document collection can involve multiple distinct domains. In some contexts, these concepts from distinct domains are related by causal, spatial, or similarity relationships. Each domain can be represented as a conceptual structure, such as a lightweight ontology. Lightweight ontologies include concepts, concept taxonomies, relationships between concepts, and properties that describe concepts [14]. Relationships between domain concepts can be translated into relationships between lightweight ontology concepts, producing a knowledge base composed of multi-related lightweight ontologies. An example of how such situations arise in the real world is shown in Section 2. To deal with the vagueness typical of human knowledge, the fuzzy set theory [42] can be used to manipulate knowledge in the bases. It also deals with the uncertainty that may be present in document and query representations and their relationships. Uncertainty and vagueness are found in many different parts of the retrieval process. The user’s expression is vague, the representation of a document informative content is uncertain, and so is the process by which a query representation is matched to a document representation. The effectiveness of an information retrieval system is therefore related to its capability to deal with the vagueness and uncertainty of the retrieval process. In the present work, a novelty is the representation of a knowledge base that allows to relate lightweight ontologies, each one representing a distinct knowledge domain. Relationships among concepts within one ontology and from distinct ontologies can be represented by fuzzy relations, with crisp relationships being a particular case. In this model, existing ontologies can be reused without modification, since their relationships with other ontologies are represented externally and independently. Related work is presented in Section 3, and the proposed model is described in Section 4.

Relating Ontologies with a Fuzzy Information Model

3

To illustrate the applicability of this knowledge representation model, it has been used in a information retrieval application with multi-related ontologies, as presented in Section 5. Given a query with concepts from distinct domains, new semantically related documents, indexed by other domain ontologies, can be retrieved based on these ontology relationships. The inference in the knowledge base is performed by the proposed automatic query expansion method, which takes into account ontology concepts and their relationships. In this query expansion method, new concepts are added to the query before it is submitted to the retrieval system. Documents are indexed by the concepts in the ontologies, allowing retrieval by their meaning. Documents need not be indexed by each ontology concepts as in a faceted approach. Results obtained with the proposed fuzzy information retrieval model are compared with those obtained by using only the user’s entered keywords, and with those obtained by another fuzzy information retrieval system, the multi-relationship fuzzy concept network information retrieval model [6,16]. The proposed expansion method is also employed in expanding queries for the Apache Lucene [1] search engine. Results have shown an enhancement in precision for the same recall measures, as presented in Section 6.

2 Motivation Example The motivation for this work is the fact that knowledge can be expressed in multiple distinct domains related to each other. The relationships between concepts from distinct domains can be weighted to represent the strength of their relation. Larger weights indicate more closely related concepts. Consider for example the Wine Agent 1.0 [31] used to illustrate a semantic web application. The wine and food ontologies represent distinct domains, but they are integrated to enable inferences by the agent to recommend wines for a given food choice. An excerpt of the food and wine lightweight ontologies, as well as their relationships, can be seen in Fig. 1. It employs the knowledge representation proposed in this work, using a knowledge base composed of related lightweight ontologies. The food and wine ontologies are independent; their relationships are shown as dashed lines. Relationships are weighted to indicate how well meal and wines are paired. The larger the weight, the more appropriate the corresponding wine is to the related meal. Relationship weights are established based on the wine values for the properties color, sugar, body, and flavor, as well as on restrictions posed on these properties by each meal course. For example, the meal pasta with regular red sauce pairs well with dry red varieties of medium-bodied wines featuring moderate flavors. Thus it pairs with strength value 1.0 to the Mountadam Pinot Noir, which is red, dry, medium bodied and moderate in flavor. As to the Elyse Zinfandel, the selected pasta pairs with a strength value 0.75, because it is red, dry, of moderate flavor, but full bodied. It pairs with Mount Eden Vineyard Estate Pinot Noir with strength value 0.5 because half of the wine properties and food restrictions match: it is red and dry, but full bodied and of strong flavor. Given the knowledge representation with the weighted relationships, conclusions can be drawn by the application. In the food and wine pairings, the relationships with weight 1.0 indicate the perfect matching, and thus the wines to be chosen. In this work, a framework is proposed to model related ontologies as motivated by this example, and an information retrieval application is developed using this knowledge representation. The next sections discuss these issues.

4

Maria Angelica A. Leite, Ivan L. M. Ricarte

Food

Pasta

Red meat

Pasta with regular red sauce 0.5 0.75

Saucelito Canyon

Regular red meat

Pasta with spicy red sauce 0.5

1.0

1.0 0.75

1.0

Spicy red meat 1.0

0.75 1.0 0.75

Elyse

Marietta

Mountadam

Zinfandel

Mount Eden Vineyard Estate Pinot Noir

Wine

Fig. 1 Related wine and food ontologies.

3 Related Works Based on Knowledge In the last decades, knowledge representations employing ontologies gained popularity, and many ontologies reflecting different domains were made available. In this context, questions related to the reuse of independently developed ontologies become more relevant. Multiple ontologies need to be accessed from different systems. The distributed nature of ontology development has led to dissimilar ontologies for the same or overlapping domains. Thus, various parties with different ontologies do not fully understand each other [7]. Reuse of existing ontologies is only possible by expending considerable effort in the task of their integration [20]. When one wants to reuse different ontologies together, they have to be combined in some way. This can be done by integrating the ontologies, which means that they are merged into one new ontology, or else the ontologies can be kept separate and mapped to each other [21,44]. In both cases, the ontologies have to be aligned – brought into mutual agreement – by finding the places in the ontologies where they overlap [19]. Finally, the integrated ontologies must be checked to ascertain their consistency and coherence, and non-redundancy of the result. There are tools to support the ontology integration task [32, 36]. In some situations, the involved ontologies have their structure changed in order to be integrated. Another approach is to construct ontologies in a modular manner, and provide interfaces in such a way that their knowledge bases can only be accessed by other ontologies through these interfaces [11]. The integrated ontologies are used mainly for data integration [8, 30, 51], but some works employ ontology integration for information retrieval [35, 52]. In the proposed approach, ontologies describe distinct domains that do not overlap but can be related in some way by causal, spatial, or some similarity associations. These related ontologies are then used in an information retrieval application. In general, ontology-based information retrieval works use just one ontology to encode the documents knowledge. This section presents some knowledge representation works that encodes knowledge used in information retrieval applications aiming to improve retrieval performance.

Relating Ontologies with a Fuzzy Information Model

5

By combining lexical-syntactic and statistical learning approaches, Lau et al. proposed a fuzzy domain ontology mining algorithm to support ontology engineering [23]. They use a single ontology as a knowledge base, and present studies confirming that the use of a fuzzy domain ontology leads to significant improvement in information retrieval. In Semantic Portal [57], it is possible to perform traditional text information retrieval using keywords, as well as logical retrieval using ontology knowledge with formal query and ontology reasoning. The model extends the search capabilities of existing methods, and can answer more complex search requests. An index structure [29,28] that combines an inverted index, a spatial index, and an ontology-based structure is used to retrieve documents using both the text and geographic references contained in it. By means of the index structure it is possible to solve pure textual queries, pure spatial queries, textual queries with place names, and textual queries over a geographic area. The multi-relationship fuzzy concept network information retrieval model [6, 16] employs knowledge encoded as a fuzzy conceptual network. Each node can be related to another one by three relation types Vr : C × C → [0, 1] where C is the concept set, and r ∈ {P, G, S} denotes the fuzzy positive association (P), fuzzy generalization association (G), and fuzzy specialization (S) association. These relations are constructed automatically based on word co-occurrence in the documents. The implicit relationships between concepts are inferred by calculating the transitive closure for the relations, resulting in new relations Vr∗ . Documents are associated to concepts by a fuzzy relation U : D × C → [0, 1] where D is the document set. Using the transitive closure relations Vr∗ , the system infers new concepts to be associated to documents resulting in the expanded document descriptor relations Ur∗ = U ⊗ Vr∗ . The query q is composed of concepts from the concept network. When a query is executed, the system calculates the degree of satisfaction DSr (di ) to which document di ∈ D satisfies the user’s query q by using the expanded document descriptor relations. The degrees of satisfaction to which documents satisfy the user’s query by different fuzzy relations are aggregated to obtain the overall satisfaction for the query. The aggregation assigns a score to each of the documents, and they are presented to the user in decreasing order of scores. The fuzzy ontological relational model [43] employs a knowledge base as a fuzzy ontology, with concepts representing categories and keywords of a domain. When the user enters a query composed of concepts, the system performs its expansion and may add new concepts, based on the ontology knowledge. After expansion, the similarity between the query and the documents is calculated by fuzzy operations. The OnAir system [40] is an ontologyaided information retrieval system used to retrieve video fragments from video collections. The ontology is used to expand the user query expressed in natural language. In the expansion process, query terms present in the ontology are given a larger weight, and new terms are added to the query according to the ontology knowledge. Tests indicate that the ontology knowledge achieves better recall and precision measures in the retrieval process. The OntoExplo system [15] retrieves documents according to two types of domain knowledge, content and task, both modeled in the form of ontologies. Content is expressed by a domain-ontology representing the domain treated in the documents. Task is represented by a second ontology, and corresponds to the extraction of the domain structure and to scientific monitoring. Within this organization, a user can provide an author name, and the system shows all the documents related to that author. Alternatively, if the user selects a concept from the content ontology, then all related authors and publications are displayed. The OWLIR system [12,48] retrieves documents using both pure text and semantic annotation provided by knowledge in an ontology. Tests measured precision and recall for document retrieval. Documents are indexed with text only and text with semantic markup provided by

6

Maria Angelica A. Leite, Ivan L. M. Ricarte

the ontology knowledge. The average precision of text with semantic markup was higher than that of text only. Quan et al. [47] proposed an ontology-based fuzzy retrieval framework for digital libraries using a fuzzy ontology to represent uncertain information in a digital library, and fuzzy queries for retrieving information from the fuzzy ontology. In their architecture, scholarly knowledge is represented by means of an ontological formalism such as a scholarly ontology. Each digital library has its own single scholarly ontology, and the ontologies are not related to each other as they are in the approach proposed in this work. Their fuzzy retrieval service can access information from multiple digital libraries and match results to present to users. Performance evaluation was based on precision and recall of both crisp and fuzzy queries. Crisp queries obtained good performance in terms of recall, but were not so good on precision. On the other hand, fuzzy queries did not achieve significant improvements on recall, but did so on precision. Jalali and Borujerdi [17] propose a hybrid retrieval algorithm for discovering relevance between queries and documents based on a combination of keyword- and concept-based approaches. Their approach assumes that improvements can be achieved by exploring relations between concepts in an ontology as well as their statistical dependencies over the corpus. It uses the MeSH (Medical Subject Headings) metathesaurus knowledge to calculate concept-based similarities and detect documents related to users’ queries which are semantically close to each other while not necessarily sharing keywords. In addition, it expands initial queries with concepts introduced by pseudo-relevance feedback that captures relations between queries and documents, which rely on statistical dependencies between concepts they contain. Experimental results on a subset of MEDLINE database documents show 21% improvement over keyword-based approach in terms of mean average precision. Parry [39] argues that particular issues arise from the use of multiple ontologies, and suggests the fuzzy ontology in order to allow a common framework, or base ontology. But in his case the ontology deals with a single domain, and only ”is-a” type relations are currently used, based on the existing MeSH hierarchy. His proposed ontology captures different membership values associated with different users and groups. Each user would have their own values for the membership assigned to terms in the ontology, reflecting their likely information needs and world view. Parry’s proposed approach is different from that of the present work because in his work ontologies capture the views of different users or groups of the same domain, whereas in the present work each ontology represents a distinct domain. Some systems use the faceted approach [5] to classify information resources. The facet classification is a set of mutually exclusive and jointly exhaustive categories, each made by isolating one perspective on the resources (a facet), that combine to completely describe all the objects in question. Users can search and browse the facets to find what they need. Ontologies can be used to represent knowledge in the facets, but in this case each ontology represents a different aspect, or facet, from the information resources, and they are not related. The DocCube system [33] promotes an approach where information searching and exploration takes place in a domain-dependent semantic context. A given context is described along concept hierarchies or ontologies that depict different facets of documents. The ontologies allow users to explore the information space, i.e., the domain vocabulary and the domain structure. In this way, users never lose the semantic context of their current query or interest formulation. Another example is the enhanced faceted semantic browser [54] that uses an enhanced faceted navigation system with support for personalization and collaboration. It employs semantic web technologies for multimedia information retrieval. Facets are generated automatically based on multimedia domain knowledge provided by an ontology, and users’ preferences.

Relating Ontologies with a Fuzzy Information Model

7

In a document collection, knowledge contained in documents can be expressed by distinct, non-overlapping domains, each represented by an independent ontology. In some contexts, knowledge of domains can be related by causal, spatial or similarity associations. The model proposed in this work explores this kind of related knowledge in information retrieval. It differs from other models in that it employs multiple ontologies that neither overlap nor represent orthogonal knowledge as in the faceted approach. In our approach, knowledge in the ontologies can be related according to context. The model allows knowledge to be expressed in distinct lightweight ontologies, and offers a way to represent each of the ontologies independently, as well as the relationships among them. This knowledge representation relates ontologies to each other, but keeps each ontology structure separate. 4 Information Retrieval Model Any knowledge-based information retrieval model must define how knowledge is encoded and how documents and queries are represented, thus enabling the model to establish a relevance function to relate a user query to a set of retrieved documents. The subsequent sections present such formulations, as well as a way to use the knowledge base to expand user queries and retrieve documents that, even though not containing the terms given in the user query, are nonetheless related to user needs. 4.1 Knowledge Representation The proposed knowledge representation uses multiple ontologies, each representing a distinct domain of interest. Even though ontologies may represent complex associations and rules, the focus in this work is on the relationships between concepts of distinct ontologies. Therefore, without loss of generality, each ontology is considered to be a lightweight ontology, i.e., a concept hierarchy with specialization and generalization relationships between their own concepts. Formally, a lightweight ontology is represented by a set Dk of concepts cky representing one domain of interest, with 1 ≤ y ≤ m and m = |Dk |. Dk = {ck1 , ck2 , . . . , cky , . . . , ckm }, 1 ≤ k ≤ K and K is the number of domains. Concepts of an ontology are organized as a taxonomy, and are related by fuzzy specialization associations S and their inverse fuzzy generalization associations G. A concept is regarded as a fuzzy generalization of another concept if it consists of that concept, or includes that concept in a partitive sense. A concept is regarded as a fuzzy specialization of another concept if it is part of that concept, or a kind of that concept. Definition 1 Given a set Dk with concepts from a domain of interest, 1. The fuzzy generalization association is a fuzzy relation RG k : Dk × Dk → [0, 1] not symmetric, not reflexive, and transitive. 2. The fuzzy specialization association is a fuzzy relation RSk : Dk × Dk → [0, 1] not sym−1 metric, not reflexive, and transitive, with RSk = (RG k) . Besides the explicit relationships between concepts expressed by RG k and its inverse, implicit relationships are given by the weighted transitive closure of the fuzzy generalization and fuzzy specialization associations. The weighted transitive closure of the associations RG k S∗ and RSk results in the relations RG∗ k and Rk respectively. The following definition expresses how to compute the weighted transitive closure.

8

Maria Angelica A. Leite, Ivan L. M. Ricarte

Definition 2 The weighted transitive closure R∗ of a fuzzy relation R can be determined by an iterative algorithm that consists of the following steps: 1. Compute R0 = R ∪ [wet (R ◦ R)] where wet ∈ [0, 1], t ∈ {G, S}; 2. If R0 6= R, let R = R0 and go to step 1; otherwise, R∗ = R0 , and the algorithm terminates. In step 1, R ◦ R denotes the composition between two fuzzy relations [41]. The composition between two fuzzy relations P : X × Y and Q : Y × Z is the fuzzy relation R : X × Z as in equation 1. R(x, z) = (P ◦ Q)(x, z) = max min[P(x, y), Q(y, z)] y∈Y

(1)

Weights wet < 1 penalize the association strength between distant concepts in the ontology. As the distance between concepts increases, the composition reduces the strength of their association. To discard an association between concepts with very low strength value, a boundary value can be defined to establish the minimum value required for a corresponding association to be taken into account. In addition to specialization and generalization relationships within one ontology, concepts from distinct ontologies can be related, thus integrating several ontologies into one knowledge base. The model takes into account domains with knowledge concepts that do not overlap, but can be related in some contexts. Fig. 2 illustrates a knowledge base with two ontologies for domains Di , with concepts ci1 to ci5 , and D j , with concepts from c j1 to c j6 . The dashed lines indicate an association between concepts ci4 and c j2 from the two ontologies.

Di

Dj C

C

C

13

i2

C

i1

C

C

i4

C

i5

C

j1 Cj5

j2

j3

Cj4

Cj6

Fig. 2 Knowledge base composed of two related ontologies.

Relationships between ontologies can be of distinct natures, such as causal, spatial and similarity. For example, a causal relation might be established between concepts plague and virus from two distinct ontologies. Similarly, a spatial relation might be established between a concept Northeast from a lightweight ontology of geographic regions and the concept Semi-arid from a climate ontology, or a similarity relation between concepts storage and silo. Such relationships between concepts pertaining to distinct ontologies are represented by the fuzzy positive association P, defined next. Definition 3 Consider two sets Di and D j representing concepts of distinct but related domains. The fuzzy positive association is a fuzzy relation RPij : Di × D j → [0, 1] which is not symmetric, not reflexive, and not transitive.

Relating Ontologies with a Fuzzy Information Model

9

The fuzzy positive association RPij indicates the strength with which the concept ciy from domain Di is positively associated with concept c jy from domain D j . A value of zero indicates that there is no positive association between the concepts. Fig. 3 illustrates the knowledge representation model. Domains Di and D j are represented by lightweight ontologies with specialization (S) and generalization (G) relationships among their own concepts ciy and c jy respectively. To express relationships between concepts ciy and c jy from ontologies of distinct domains Di and D j , the fuzzy positive associations Pi j and Pji are used.

Fig. 3 Knowledge representation.

4.2 Document Representation In a knowledge-based information retrieval model, documents are associated to concepts. A relation Uk (dl , cky ) indicates the degree of association between document dl from the set of all documents DOC and concept cky ∈ Dk . The Uk relation indicates the relevance of the concept to represent the document content, and is represented by a matrix of dimension |DOC| × |Dk |. Matrix element values can be calculated by a tf-idf (term frequency, inverse document frequency) computation [2] as follows. Definition 4 Let N be the total number of documents in the system, cky a concept from domain Dk where 1 ≤ y ≤ |Dk |, and ny the number of documents in which the concept cky appears. Let rly be the raw frequency in document dl for concept cky (the number of times concept cky is mentioned in document dl ). Then, the normalized frequency fly in document dl for concept cky is given by equation 2. fly =

rly maxt rlt

(2)

The maximum is computed over all terms t which are mentioned in document dl . If a concept cky does not appear in dl then fly = 0. Furthermore, let idfy , the inverse document frequency for cky , be given by equation 3. N idfy = log (3) ny The tf-idf weight uly in document dl for concept cky is given by equation 4. uly = fly × idfy

(4)

10

Maria Angelica A. Leite, Ivan L. M. Ricarte

4.3 Query Representation In an information retrieval application, users express their needs by defining a query with search terms from the valid vocabulary. A knowledge-based information retrieval model associates these terms with concepts from the domains of interest; these concepts are connected by logical operators. The logical expression for a user query quser can be transformed into the conjunctive normal form. Each term qh , 1 ≤ h ≤ Tq , of this form expresses a subquery composed of a set of concepts connected by the OR logical operator. The user query quser combines the subqueries qh with the AND logical operator, as in equation 5. quser =

Tq ^

qh

(5)

h=1

Once the user query quser is in the conjunctive normal form, each resulting subquery can be performed independently. As described in Sect 4.2, documents are associated to the domain concepts using distinct relations Uk (dl , cky ). To deal with this, subqueries are partitioned to take the concepts from each domain separately. Each partition is a Boolean set with dimension equal to the number of associated domain concepts, composed of values that simply indicate the presence (1) or absence (0) of the concept in the query. A subquery qh is partitioned in qi Boolean sets where 1 ≤ i ≤ K and K is the number of domains. This idea is illustrated with an example. Given the domains D1 = {c11 , c12 , c13 } and D2 = {c21 , c22 , c23 , c24 }, a valid user query in this format would be quser = (c11 ∨ c22 ) ∧ (c13 ∨ c24 ). In this scenario, the subquery qh = (c11 ∨ c22 ) is partitioned as q1 = [1 0 0] and q2 = [0 1 0 0]. When a subquery qh is executed, it retrieves a document set Vh . The final document set result V , for the user query quser , is given by the intersection of the document sets Vh as indicated in equation 6. V=

Tq ^

Vh

(6)

h=1

4.4 Query Expansion It is difficult to express an information need using exact query terms. Query expansion allows the user to carry out searches and include other terms related to the original query terms. In the proposed query expansion method, new terms are added based on knowledge expressed in the knowledge base of the related ontologies. The model performs an automatic query expansion. Weightings are calculated for all concepts in the knowledge base, and selected concepts are added with their weights to the initial query. The proposed query expansion method is performed in two steps. In the first step, each partition qi from the initial subquery qh is expanded to take into account the relations between the domain Di associated to the partition, and other domains from the knowledge base. This step performs expansion taking into account concepts among the distinct domains, each represented by a lightweight ontology. For each partition qi , K − 1 new sets are generated, each containing concepts from the other domains D j , j 6= i, 1 ≤ i, j ≤ K, associated to concepts present in qi . This process generates a new expanded query denoted qe. The first expansion is translated into equation 7. Index i refers to the domain of partition qi , and j to the remaining domains of the knowledge base.

Relating Ontologies with a Fuzzy Information Model

qe =

K _ K _ i=1 j=1

(

11

qi   j=i 6 i wP qi ◦ RPij j =

(7)

This first step uses the fuzzy positive association RPij between concepts from domains Di and D j . The model allows to associate a value wP ∈ [0, 1] that defines a weight for the fuzzy positive association, by means of which the influence of fuzzy positive association in the expansion process can be adjusted. The query partition is not expanded on its own domain when j = i. When other domains are taken into account, i.e., when j 6= i, the expansion calculates a fuzzy relational image for the partition qi under the fuzzy positive association. The calculated fuzzy relational image relates concepts from domain D j to concepts from domain Di . A fuzzy relational image [34] of a fuzzy set A : X under a fuzzy relation R : X × Y is a fuzzy set B : Y defined by equation 8. B(y) = (A ◦ R)(y) = max min[A(x), R(x, y)] x∈X

(8)

Each expansion generates a new set corresponding to domain D j . The new set contains concepts from domain D j , and their values denote the degree to which the associated concepts from domain D j are related to concepts from partition qi , regarding the fuzzy positive association. To illustrate the first expansion step, consider two domains D1 and D2 and a subquery qh = q1 ∨ q2 partitioned in both domains. Fig. 4 shows the result of the expansion process. Each partition qi , 1 ≤ i ≤ 2, from the initial subquery qh is expanded in the other domains generating qei j partitions, 1 ≤ j ≤ 2, constituting the expanded subquery qe.

Fig. 4 Query expansion process.

Fig. 5 exemplifies the first step of the expansion with a knowledge base composed of domains D1 and D2 . In the figure schema, note that concepts c14 and c22 , pertaining to domains D1 and D2 respectively, are associated by the fuzzy positive association represented

12

Maria Angelica A. Leite, Ivan L. M. Ricarte

by dashed lines. Let the fuzzy positive association value between concepts be equal to 1, i.e., RP12 [c14 , c22 ] = RP21 [c22 , c14 ] = 1 and the weight wP = 0.7. In this case, the example subquery qh = c14 , composed of just one concept, is represented by q1 = [0 0 0 1 0]. After the first expansion step, the resulting expanded query is qe = (c14 or c22 ). The query representation, in partitioned form, is qe = ([0 0 0 1 0] ∨ [0 0.7 0 0 0 0]). In this expansion step, the concept c22 is added to the expansion of concept c14 . The partitioned query form stores the weight (0.7), with which the new added concept c22 is associated to the expanded concept c14 .

D1

D2 C

C

C

12

R

11

C

P

W P C

14

C

13

C

C

15

23

21 C

22

C

24

25

C

26

Fig. 5 First step of subquery expansion in domains D1 and D2 , taking into account the fuzzy positive association (P).

After expansion among domains, the second step is performed. This step expands subquery qe using only knowledge expressed within each ontology. First of all each qe partition is transposed. Employing knowledge from within the domains, each transposed partition qeTij , 1 ≤ i, j ≤ K is expanded to take into account the fuzzy generalization and fuzzy specialization associations between the concepts from their domain D j . This expansion generates the final transposed expanded subquery qTx as shown in equation 9. The association type, specialization or generalization, is denoted by index r ∈ {S, G}. The model allows to assign a value wr ∈ [0, 1], r ∈ {S, G} that defines a weight to the association type. In this way the expansion can be adjusted to emphasize one association type over the other. qTx =

K _ K _ i=1 j=1

 max

qeTij where r ∈ {S, G} wr (R∗r j ◦ qeTij )

(9)

Fig. 4 shows the result of the expansion inside domains for the subquery qe. For example, partition qeT12 is expanded once again with concepts from the D2 domain taking into account the fuzzy specialization association qeT12(S) and the fuzzy generalization association qeT12(G) . The final expanded partition is given by qx T12 = max(qeT12 , qeT12(S) , qeT12(G) ) and is the maximum value among values in the qeT12 partition, from the first expansion step, and the values in partitions qeT12(S) and qeT12(G) from the second expansion step, where the fuzzy associations types r = {S, G} for the domain D2 are taken into account. Fig. 6 illustrates the second step of subquery expansion with a knowledge base composed of domains D1 and D2 . In the lightweight ontology for domain D1 , the concept c11 is more general and the concept c15 is more specific than concept c14 . In the D2 domain, the concept c21 is more general and concepts c23 and c24 are more specific than concept c22 . In this expansion step, more general and more specific concepts than those already present in qe are added to the subquery. The query qx = (c11 or c14 or c15 ) or (c21 or c22 or c23 or c24 )

Relating Ontologies with a Fuzzy Information Model

13

is obtained after performing the second step. Assume wG = 0.3, wS = 0.7 and let the fuzzy specialization and fuzzy generalization values between concepts in both D1 and D2 domain ontologies be equal to 1. The subquery representation in partitioned form is qx = ([0.3 0 0 1 0.7] ∨ [0.21 0.7 0.49 0.49 0 0]). The partitioned subquery form stores the weights associated to the new added concepts.

D1

D2 C

C

C

11

12

13

R* G; R* S C

W G, W S C

14

C

C

15

23

C

21 C

22

C

24

25

C

26

Fig. 6 Second step of subquery expansion in domains D1 and D2 involving fuzzy specialization association (S) and fuzzy generalization association (G).

4.5 Document Relevance Once an information retrieval system executes an user query, the expectation is that the most relevant documents from the document collection will be returned. Usually the system employs a relevance function that assigns a score to each document indicating a measure of relevance. The higher the score, the more relevant is the document to the user query. Finally, documents are presented to the user in decreasing score order. In the proposed model, the relevance of each document for the user query is given by a relevance function relating representations of documents to the expanded fuzzy subquery qxT . Document relevance is calculated by the product of relations U j and each partition qxiTj , as in Equation 10, resulting in the set Vh of retrieved documents.

Vh =

K _ K _

U j qxiTj



(10)

i=1 j=1

Each relation U j associates the documents in the collection to the D j domain concepts, for 1 ≤ j ≤ K. The set qxiTj , representing the resulting expansion of concepts from the partition qi to the domain D j for 1 ≤ i, j ≤ K, is composed of concepts from domain D j , each with its assigned value indicating the degree to which the concepts from domain D j are associated to the concepts in partition qi . The arithmetic product U j qxiTj indicates the relevance W of documents associated to domain D j that are related to partition qi . The symbol designates union and denotes the max operator. The arithmetic product adjusts the associations of documents to domain D j concepts (expressed in the relations U j ) based on the strength

14

Maria Angelica A. Leite, Ivan L. M. Ricarte

of the relationships between concepts present in qxiTj . The set Vh (vl ) represents all documents in the collection, and each value vl indicates the degree of relevance of document dl , 1 ≤ l ≤ |DOC|, to the initial user subquery. The final document answer set is the intersection of answer sets of subqueries Vh (vl ) as shown in equation 5. To the user, it is more interesting to see just a V subset containing the most relevant documents. To accomplish this, a threshold value f establishes a minimum level of relevance required for a document to be included in the user answer set. In this way, only documents with V (vl ) ≥ f are presented to the user, and in decreasing order. In the following it is shown a high level algorithm to represent the information retrieval model just presented in previous sections.

Algorithm: Information Retrieval Input: User query quser Output: Retrieved documents set V Data: Domain sets Dk of concepts from lightweight ontologies and documents set DOC The weighted transitive closure R∗Sk of fuzzy specialization relationships The weighted transitive closure R∗Gk of fuzzy generalization relationships The fuzzy positive associations RPij between concepts of distinct domains The relations Uk between the documents set DOC × Dk domains The values for weights wP , wG , wS and the threshold value f VTq qh Generate the conjunctive normal form quser = h=1 foreach Subquery qh do Generate qi partitions with concepts from each domain Di ; foreach Partition qi do foreach Domain D j do if i = j then The expansion value is equal to the partition value: qei j = qi ; else   Expand qi partition with concepts from D j domain: qei j = wP qi ◦ RPij ; end end end foreach qei j partition expansion do Transpose qei j resulting qeTij end foreach qeTij transposed partition expansion do foreach D j domain do Perform the expansion inside D j domain: qxiTj =      max qeTij , wS R∗S j ◦ qeTij , wG R∗G j ◦ qeTij ; end end   Calculate the documents set for qi partition: V qi = max U j qxiTj ; Calculate the documents set for qh subquery Vh = maxV qi ; end Calculate the final document set V for user query quser : V = minVh ; Sort V document set in decreasing document score order; Present the documents ∈ V with score value greater than the threshold value f ;

Relating Ontologies with a Fuzzy Information Model

15

Fig. 7 Brazilian geographical map with regions and states.

5 Information Retrieval Application This section shows an application of the proposed information retrieval model to illustrate its use. First, the construction of the knowledge base, composed of two related ontologies, is presented. Based on this knowledge, the model structures are constructed, and the proposed query expansion method is performed, taking into account a sample document collection associated to the concepts of the ontologies.

5.1 Knowledge Representation Construction This application deals with knowledge about an agrometeorology domain. Consider the territorial division and the climate domains. These are distinct domains, but a relation can be established between a territorial division and a climate classification by the observation of geographic and climatic maps. Geographic domains are, in general, hierarchically organized, and can be represented by domain ontologies. Fig. 7 shows a map of the Brazilian geographic territorial division, and Fig. 8 shows a map of the K¨oppen climate [56] distribution over the Brazilian territory [46]. The Brazilian geographic territorial division and the K¨oppen climate distribution over the country can be represented as distinct fuzzy lightweight ontologies. Fig. 9 shows the fuzzy ontology of the Brazilian territory, and Fig. 10 shows the fuzzy ontology of the Brazilian K¨oppen climate. In both ontologies, weights between concepts are calculated based on the spatial distribution over the maps, and represent the degree to which the concepts are related.

16

Maria Angelica A. Leite, Ivan L. M. Ricarte

Fig. 8 Brazilian map with the K¨oppen climate distribution over the country.

The Brazilian geographic territorial division ontology has three levels. Geographic entities are represented as concepts in the ontology. The root node is labeled ’Brazil’, its descendant nodes are labeled with region names, and each region node has the respective state nodes as descendants. The spatial distribution between geographic entities on the map is represented as relationships between concepts in the ontology. In the Brazilian geographic map it can be seen that Par´a State belongs to the North Region. In this case there is an arc representing the relationship between the Par´a concept and the North Region concept. The value 0.32 on the arc indicates that Par´a State occupies 32% of the North Region. This denotes that Par´a State specializes North Region by 0.32 degree. Fig. 10 shows the fuzzy ontology representing the climate distribution over Brazil. Climate entities are represented as concepts in the ontology. The root node is labeled ’Climate’, the root descendant nodes are labeled with Brazilian zonal climates and each zonal climate has the respective K¨oppen climate nodes as descendants. The spatial distribution between climate entities over the Brazilian territory is represented by fuzzy relationships between concepts in the ontology. The value of the relationships is given by scanning the maps. In the K¨oppen climate distribution over the Brazilian territory, the Am K¨oppen climate is associated to Tropical climate. In this case there is an arc representing the relationship between the Am concept and the Tropical concept. The value 0.57 assigned to the arc indicates that Am K¨oppen climate corresponds to 57% of the total Tropical climate in Brazil. This denotes that the Am concept specializes the Tropical concept by 0.57 degree. The relationship between climate entities and geographic entities where the climate occurs is represented by fuzzy positive associations between concepts from both ontologies. These ontologies can be related by establishing fuzzy relationships between territory and climate concepts based on spatial distribution in the maps. Fig. 11 shows a sample from both ontologies and some stated relationships. The relationship is established in two levels. The first one is between Brazilian regions and zonal

Relating Ontologies with a Fuzzy Information Model

17

Brazil 0.19

0.07

Central Region

South Region 0.17 Santa Catarina

0.55

0.11

0.45

0.48

0.35

Rio Grande Sul

Paraná

0.22

Mato Grosso

0.23 Mato Grosso Sul

Goiás

0.18 Southeast Region

North Region 0.32

Pará

0.41

Amazonas

0.04 0.04

Acre

Amapá

0.06

0.06

0.06

Tocantins

Roraima

0.63

0.27

Minas Gerais

Rondônia

0.05

0.05

Rio de Janeiro

São Paulo

Espírito Santo

Northeast Region 0.37

Bahia

Pernambuco

0.06

Ceará

0.10

Maranhão

0.21

0.01

Sergipe

0.04

0.02

Paraíba

0.03

0.16

Piauí

Alagoas

Rio Grande Norte

Fig. 9 Brazilian territory fuzzy lightweight ontology.

Climate 0.14

Equatorial 1.0

Af

0.64

Tropical 0.57

Am

Subtropical

Temperate

0.54

0.21

0.43

Aw

0.05

0.02

0.14

Cfa

0.46

Cwa

Cwb

0.79

Cfb

Semiarid 1.0

Bsh

Fig. 10 Brazilian climate fuzzy lightweight ontology.

climates, and the second one is between Brazilian states and K¨oppen climates. The dashed lines in Fig. 11 illustrate both relationship levels. The value of the relationships is given by map scanning. For example, consider the relation between Par´a State and the Am K¨oppen climate. The total amount of Am K¨oppen climate in Brazil is 33, 833 pixels. The amount of Am K¨oppen climate in Par´a State is 14,093 pixels. Thus, the association between Par´a State and the Am K¨oppen climate is given by the relation value 0.42. This means that the Par´a concept implies the Am K¨oppen climate concept with a strength value of 0.42. On the other hand, the extent of Par´a State is 14,290 pixels, so the association between Am K¨oppen climate and Par´a State is given by the relation value 0.99. This means that the Am K¨oppen climate concept implies the Par´a State concept with a strength value of 0.99. Once the knowledge base with the related ontologies is built, the goal is to use this knowledge organization to retrieve semantically related documents to a user query. Consider a Brazilian agrometeorology document collection. The agrometeorology domain includes the climatic and geographic knowledge as expressed in the related lightweight ontologies in Fig. 11. Assume a document sample as shown in Fig. 12, where documents are indexed by the concepts of ontologies. In this scenario, a query like “North Region and

18

Maria Angelica A. Leite, Ivan L. M. Ricarte

Brazil 0.45

0.18

North Region

Northeast Region

0.32

0..21 0.18

0.69

Pará

Maranhão 0.51

0.70

Climate 1.0

0.42

0.99

0.64

0.57

0.78

Semiarid 1.0

0.43 Aw

Am

0.11

0.05

Tropical

0.27

BSh

Fig. 11 Brazilian territory and Brazilian climate lightweight ontologies related by fuzzy positive associations.

Tropical” will not bring any documents, because there are no documents simultaneously indexed by both these concepts nor by their ascendants or descendants in the ontologies. But considering the relations between concepts from both ontologies it can be seen that Tropical climate is strongly related to North Region (strength 0.7). This relationship can be interpreted as follows: a document related to Tropical climate concept is also relevant to a North Region concept because the Tropical climate is the predominant climate in the North Region. Now looking at the North Region descendant, i.e., the Par´a State concept, and the Tropical climate descendant, i.e., the Am K¨oppen climate concept, these are also highly related (strength 0.99). This relationship can be interpreted as follows: a document related to Am K¨oppen climate concept is also relevant to Par´a State concept because the Am K¨oppen climate covers almost all the territory of Par´a State. Because these concepts are descendants from North Region and Tropical climate respectively, documents related to these concepts are also relevant to the query. Taking this into account, the documents in the set R = {Doc2, Doc7, Doc6, Doc1} are regarded as relevant to the query. This is the kind of knowledge explored in the information retrieval model proposed in this work, enhancing the relevance of retrieved documents.

5.2 Application Execution The distinct domains are given by the Brazilian territorial division, (D1 ), and the Brazilian climate distribution, (D2 ). Documents are in the DOC set as in Fig. 12. 

 c11 : Brazil, c12 : North Region, c13 : Par´a,  c14 : Northeast Region, c15 : Maranh˜ao  c21 : Climate, c22 : Tropical, c23 : Am, c24 : Aw, D2 =  c25 : Semiarid, c26 : BSh  d1 : Doc1, d2 : Doc2, d3 : Doc3, d4 : Doc4, d5 : Doc5, d6 : Doc6, DOC = d7 : Doc7, d8 : Doc8, d9 : Doc9, d10 : Doc10, d11 : Doc11 D1

=

Relating Ontologies with a Fuzzy Information Model Doc 1

Doc 2

19

Doc 3

Doc 5

Doc 4

Brazil 0.45

0.18

North Region

Northeast Region

0.32

0..21 0.18

0.69

Pará

Maranhão 0.51

0.70

Climate 1.0

0.42

0.99

0.64

Doc 6

Semiarid 1.0

0.43

BSh

Aw

Am

Doc 7

0.11

0.05

Tropical 0.57

0.27

Doc 8

Doc 9

Doc 10

Doc 11

Fig. 12 Documents indexed by the concepts from lightweight ontologies.

From Fig. 12 the fuzzy specialization and fuzzy generalization relations between concepts are extracted. The weighted transitive closure is calculated using the values weS = 0.8 and weG = 0.2. The fuzzy specialization and generalization relationships between concepts from D1 ∗ ∗ domain, RS1 and RG 1 , and their respective weighted transitive closures RS1 e RG1 are shown as follows:

c11 c12 RS1 = c13 c14 c15

c11 c12 RG 1 = c13 c14 c15

c11 c12 c13 0 0 0  0.45 0 0   0 0.32 0   0.18 0 0 0 0 0 

c11 0  0   0   0 0 

c14 0 0 0 0 0.21

c15   0 0  0.45 0    ∗  0   RS1 =  0.256  0.18  0 0.144 0

c12 c13 c14 c15   0 0.45 0 0.18 0 0 0 0.32 0 0   ∗   0 0 0 0   RG1 =  0 0 0 0 0 0.21  0 0 0 0 0

0 0 0.32 0 0

0 0 0 0 0

0 0 0 0 0.21

 0 0  0  0 0

 0.45 0.064 0.18 0.036 0 0.32 0 0   0 0 0 0   0 0 0 0.21  0 0 0 0

The specialization and generalization fuzzy relationships between concepts from D2 ∗ ∗ domain, RS2 and RG 2 , and their respective weighted transitive closures, RS2 e RG2 , are shown as follows:

20

Maria Angelica A. Leite, Ivan L. M. Ricarte

c21 c22 c RS2 = 23 c24 c25 c26



c21 0  0.64   0   0   0.05 0 

0 0  0  RG 2 = 0  0 0

0.64 0 0 0 0 0

c22 0 0 0.57 0.43 0 0

0 0.57 0 0 0 0

c23 0 0 0 0 0 0

0 0.43 0 0 0 0

0.05 0 0 0 0 0

c24 0 0 0 0 0 0

c25 c26   0 0 0  0.64 0 0     0 0   R∗S2 =  0.456   0.344 0 0    0.05 0 0  1.0 0 0.04

  0 0 0 0     0   R∗G2 =  0 0 0    0 1.0  0 0

0 0 0.57 0.43 0 0

0.64 0.114 0.086 0 0.57 0.43 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0.05 0 0 0 0 0

0 0 0 0 0 1.0

 0 0  0  0  0 0

 0.01 0   0   0   1.0  0

From Fig. 12, the fuzzy positive associations RP12 and RP21 between concepts of distinct domains are given as follows:

c11 c12 RP12 = c13 c14 c15

c21 0  0   0   0 0 

c21 c22 c P R21 = 23 c24 c25 c26

c11 0  0   0   0   0 0 

c22 c23 c24 0 0 0 0.51 0 0 0 0.42 0 0.18 0 0 0 0 0.11

c25 0 0 0 1.0 0

c26  0 0   0   0  0

c12 c13 c14 c15  0 0 0 0 0.70 0 0.69 0   0 0.99 0 0   0 0 0 0.78   0 0 0.27 0  0 0 0 0

The Ui relations, between the documents set DOC and the concepts from D1 and D2 domains are given, respectively, by the following U1 and U2 relations. The value 1.0 is assigned to documents and concepts relations to show the expansion process influence in the document retrieval. Thus it is easier to verify how the weights attributed to concepts, when being added to the query, influence the documents retrieval.

Relating Ontologies with a Fuzzy Information Model

d1 d2 d3 d4 d5 U1 = d6 d7 d8 d9 d10 d11

c11 c12 c13 c14 c15  0 0 1.0 0 0  0 1.0 0 0 0      1.0 0 0 0 0    0  0 0 1.0 0    0 0 0 0 1.0     0 0 0 0 0   U2 =   0 0 0 0 0     0 0 0 0 0     0 0 0 0 0     0 0 0 0 0  0 0 0 0 0 

21

d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11

c21 0  0   0   0   0   0   0   0   1.0   0 0 

c22 0 0 0 0 0 0 1.0 0 0 0 0

c23 0 0 0 0 0 1.0 0 0 0 0 0

c24 c25 c26  0 0 0 0 0 0   0 0 0   0 0 0   0 0 0   0 0 0   0 0 0   1.0 0 0   0 0 0   0 0 1.0  0 1.0 0

Consider the user query quser = North Region and Tropical. The query is already in the conjunctive normal form, i.e., quser ≡ qh , where h = 1, meaning that the query is composed of a single subquery. Documents are indexed as shown in Fig. 12 and, as discussed in Sect. 5.1, the document set relevant to the query is given by R = {Doc2, Doc7, Doc6, Doc1}. For this example, the following values are assumed: threshold f = 0.2, wP = 0.7, wS = 0.7 and wG = 0.3. According to Sect. 4.4 the query expansion is performed as shown in the following steps: 1. Partitioning the query according the domains originates q1 = [01000] and q2 = [010000]. In this case quser is represented as follows: quser = ([01000] ∧ [01000]) 2. Performing expansion between concepts of distinct domains gives: qe = ((qe11 ∨ qe12 ) ∧ (qe21 ∨ qe22 )) Equation 7 gives: qe11 = q1 = [0 1 0 0 0]  qe12 = 0.7 q1 ◦ RP12 = [0 0.357 0 0 0 0]  qe21 = 0.7 q2 ◦ RP21 = [0 0.49 0 0.483 0] qe22 = q2 = [0 1 0 0 0 0] The expansion between domains results in: qe = (([01000] ∨ [00.3570000]) ∧ ([00.4900.4830] ∨ [010000])) 3. Performing expansion inside domains results in: qxT = Equation 9 gives:

  T T T T ∧ qx21 qx11 ∨ qx12 ∨ qx22

22

Maria Angelica A. Leite, Ivan L. M. Ricarte

  T = max qeT , qeT T qx11 11 11(S) , qe11(G)   = max qeT11 , 0.7 R∗S1 ◦ qeT11 , 0.3 R∗G1 ◦ qeT11         0.135 0.135 0 0    1  0  1   0          0  ,  0.224  ,  0  =  0.224  = max             0  0  0   0 0 0 0 0   T = max qeT , qeT T qx12 12 12(S) , qe12(G)   = max qeT12 , 0.7 R∗S2 ◦ qeT12 , 0.3 R∗G2 ◦ qeT12         0.107 0 0.107 0  0   0.357   0.357   0           0.249    0.249   0  0           = max   =  0.249   ,  0.249  ,  0         0   0   0  0  0 0 0 0 0   T = max qeT , qeT T qx21 21 21(S) , qe21(G)   = max qeT21 , 0.7 R∗S1 ◦ qeT21 , 0.3 R∗G1 ◦ qeT21         0.135 0.135 0 0   0.49   0  0.49   0          ,  0.224  ,  0 0  =  0.224  = max            0.483   0  0.483   0 0.147 0 0.147 0   T = max qeT , qeT T qx22 22 22(S) , qe22(G)   = max qeT22 , 0.7 R∗S2 ◦ qeT22 , 0.3 R∗G2 ◦ qeT22         0.192 0.192 0 0    1  0  1   0           0.399   0   0.399   0   =  ,  , = max   0   0.301   0   0.301             0  0  0   0 0 0 0 0 The expansion inside domains results in:           0.192 0.107 0.135 0.135          0.357   0.49   1  1              0.224  ∨  0.249  ∧  0.224  ∨  0.399  qxT =    0.301    0.249       0.483    0     0   0 0.147 0 0 0 The expansion corresponding to North Region concept, including new concepts and their weights, results in:

Relating Ontologies with a Fuzzy Information Model

T qxNorthRegion

23

     Climate : 0.107 Brazil : 0.135      Tropical : 0.357   North Region : 1    Am : 0.249   ∨  =    Aw : 0.249  Par´a : 0.224   Northeast Region : 0    Semiarid : 0   Maranh˜ao : 0 BSh : 0

The expansion corresponding to Tropical concept, including new concepts and their weights, results in:      Climate : 0.192 Brazil : 0.135       Tropical : 1  North Region : 0.49      T  Par´a : 0.224   ∨  Am : 0.399 qxTropical =    Aw : 0.301    Northeast Region : 0.483    Semiarid : 0   Maranh˜ao : 0.147 BSh : 0 In both cases, concepts with associated 0 value are not added to the query. The initial concepts, in user query, keeps their value equal to 1. The concepts in original query always have larger values then the ones added by the expansion process. 4. Calculating the most relevant documents to the expanded query:

V=

  T T T T ∨U2 qx22 ∧ U1 qx21 ∨U2 qx12 U1 qx11

Equation 10 gives:        0 0.224 0 0.224    0.49   0  1.0   0            0.135   0  0.135   0            0.483   0  0  0          0    0.147   0  0                     V =  0  ∨  0.399   ∨  0.249  ∧  0   1   0.357   0  0           0.301    0.249   0  0           0.192    0.107   0  0           0   0  0  0 0 0 0 0 

The final document set (V )t , after applying threshold f = 0.2, is given by:

24

Maria Angelica A. Leite, Ivan L. M. Ricarte



(V )0.2

       0.224 0.224 0.224 Doc1 : 0.224  1.0   0.49   Doc2 : 0.49   0.49           0.135   0.135    Doc3 : 0  0.135            0.483   0  Doc4 : 0   0            0.147   0   Doc5 : 0   0          0.249  ∧  0.399  =  0.249  =  Doc6 : 0.249  =          0.357   1.0   Doc7 : 0.357   0.357           0.249   0.301   Doc8 : 0.249   0.249           0.107   0.192   Doc9 : 0  0.107             0  0  Doc10 : 0    0 0 0 Doc11 : 0 0 0.2 0.2

After applying the f = 0.2 threshold and sorting document scores, the answer document list A = {Doc2, Doc7, Doc6, Doc8, Doc1} of retrieved documents is presented to the user. Since the set of relevant documents for the initial query is R = {Doc2, Doc7, Doc6, Doc1}, it can be seen that the proposed information retrieval model is indeed capable of retrieving the relevant documents.

6 Model Evaluation The model evaluation uses a document collection sample of the agrometeorology domain in Brazil; a query set; a lightweight ontology of the geographical Brazilian territory; and a lightweight ontology of the climate distribution over the Brazilian territory. Both ontologies are manually constructed, as discussed in Sect. 5.1.

6.1 Document Sample Collection and Query Sample Construction The document sample collection is composed of 129 documents selected from a collection of 17,780 documents in Portuguese from the agrometeorology domain, available in the Agricultural Research Data Base [9], at the Brazilian Agriculture Research Corporation — Embrapa [10]. The document sample collection includes documents containing each one of the concepts from the ontologies separately, as well as combinations of concepts from both ontologies. As the domain ontologies are related to climate and geographic regions in Brazil, an initial search for documents containing the keyword ”climate” in the title or abstract metadata returned a subset of about 3,000 documents. Starting from this document subset, a search for each concept cky in the ontologies is executed, and the returned document set DOCky is stored, where 1 ≤ k ≤ K, K is the number of domains, and 1 ≤ y ≤ |Dk |. In this case K = 2, |D1 | = 32 and |D2 | = 14. The total number of returned documents for all concepts is equal to 676 documents. The contribution Conky ∈ [0, 1] that each concept cky and its document set DOCky gives to the total number of returned documents, taking into account all concepts, is recorded in |Da | |DOCax | = 676. equation 11 where ∑Ka=1 ∑x=1 Conky =

DOCky |D |

a |DOCax | ∑Ka=1 ∑x=1

(11)

Relating Ontologies with a Fuzzy Information Model

25

A sample of about 100 documents was chosen to run the initial tests, to prove the potential of the proposed knowledge organization and expansion method, and to maintain experiment control. This sample size was chosen because the relevance of each concept for each document present in the collection sample had to be assigned by a domain expert. Analyzing the top returned documents, and making sure the sample contained documents related to all ontology concepts, 129 documents out of 676 were selected. The number Numky of cky related documents needed to compose the sample collection of 129 documents is given by Numky = 129 · Conky , where (·) means the arithmetic product. The number Numky of cky related documents is taken from the top of the returned DOCky set. The sample document collection of 129 documents includes all the concepts from the constructed ontologies, and was found to be adequate to prove the potential of the proposed knowledge organization and expansion method. Besides, this document sample size allows assignment of relevant documents to the queries by a domain expert. The query set contains 83 queries, and is composed of queries containing a single concept from each ontology, as well as queries containing combinations of two concepts from both ontologies, in different levels, connected by AND or OR Boolean operators. In order to associate the relevant documents to each query, each document in the sample document collection is examined and reassigned to the ontology concepts by a domain expert. A new concept cky document set DOCNky is created. The assignments take into account the following directives: – For both ontologies, a document related to a more specific concept is assigned to its general concept. In this case, state concept related documents are also assigned to their associated region concept, and K¨oppen climate concept related documents are also assigned to their associated zonal climate concept. For example, a document containing the Par´a concept is also assigned to the North Region concept, and a document containing the Am concept is also assigned to the Tropical concept. – For the climate ontology, a document related to a zonal climate concept is also assigned to its more specific K¨oppen climate concepts, because the zonal climate represents the general features of the more specific concepts. For example, documents containing just the Tropical concepts are assigned to Aw and Am concepts. – A document related to a zonal climate concept and to a territorial concept is also assigned to the specific K¨oppen climate concept related to that territory (this knowledge is obtained from the map). For example, a document containing the Tropical concept and the Par´a concept is also assigned to the Am concept, as the Am concept is the Tropical specific concept occurring at Par´a State. – From knowledge of the domain expert and from observation of the map, other associations are established and reflected in the associations of documents to concepts, as for example: the Bsh K¨oppen climate and its corresponding Semiarid zonal climate occurs in some states of the Northeast Region, and therefore all documents containing just the concepts semiarid or Bsh are assigned to the Northeast Region concept, and to the states where this climate occurs. The Cwb K¨oppen climate occurs only in Minas Gerais State, in the Southeast Region, so documents containing the Cwb concept are assigned to the Minas Gerais State and Southeast Region concepts. Once the reassignments of documents to concepts are done, the relevant documents for each query are established based on operations on the DOCNky document sets, depending on the query type. The relevant documents for a query with just one concept cky is the set DOCNky . If the query is like  cki and ck j where 1 ≤ k ≤ 2 and 1 ≤ i, j ≤ |Dk | then the relevant documents set is DOCNki ∩ DOCNk j . Queries like cki or ck j results the

26

Maria Angelica A. Leite, Ivan L. M. Ricarte

 documents set DOCNki ∪ DOCNk j . The document sample collection, the query examples and the relations constructed for the ontologies are available for consultation [27].

6.2 Experimental Results Analysis A number of experiments were conducted to investigate the performance of the proposed model. First, the knowledge representation and the method for query expansion were tested using the Apache Lucene Engine [25]. The Apache Lucene allows boosting a search concept, leading to an increased relevance of documents indexed by the concept. After that the performance of the proposed information retrieval model was compared with a similar approach, the multi-relationship fuzzy concept network information retrieval model [26]. To compare the proposed model with a non-fuzzy approach, some tests were carried out using the information retrieval application based on multi-related ontologies, the multirelationship fuzzy concept network information retrieval model, and the Lucene engine, using only the keywords [24]. This paper describes the whole model, including the knowledge base representation, the query expansion method, the information retrieval model itself, and the construction of ontologies, documents collection and example queries. Besides, experimental tests consider not only fuzzy ontologies but also crisp ones. The authors are now interested in investigating the framework results in case the ontologies of domains are not fuzzy. As shown in this section, the outcome is encouraging, and usage of the proposed model could be expanded, since crisp ontologies are currently the most common kind. Several experiments were run with many combinations of the weights wet , t ∈ {S, G} and wr , r ∈ {S, G, P}. The experiments use both kinds of ontologies: fuzzy and crisp. The crisp ontology is obtained by setting the weights between concepts in a domain ontology to 1.0 for all the specialization and generalization associations stated in the fuzzy ontology. Tests included the crisp version of the corresponding fuzzy ontology to examine how the model performs in this case. Because adding general concepts tends to increase noise in search results, a lower weight value is assigned to fuzzy generalization associations such as wG = 0.3. A higher value is assigned to fuzzy specialization associations such as wS = 0.7. Following the same reasoning for weighted transitive closure calculation, tests have shown that best results are achieved with lower values assigned to generalization weight weG , and higher ones to specialization weight weS , such as weG = 0.2 and weS = 0.8. The fuzzy positive association is tested with four different weights, such as wP = 0.0, wP = 0.1, wP = 0.5, and wP = 1.0. After numerous tests, each model was found to have its own behavior tendency concerning the precision and recall measures. Recall is the fraction of the relevant documents which are retrieved over all relevant documents in the collection related to query q, and precision is the fraction of the retrieved documents relevant to a query q over all documents in the answer set [2]. Performance of the models is presented by means of graphs of precision versus recall curves. The proposed information retrieval model based on multi-related ontologies is represented by the MO curves, the multi-relationship fuzzy concept network model by the CN curve, and the Apache Lucene by LUC curves. Fig. 13 shows precision versus recall curves for the three tested models. For the information retrieval model based on multi-related ontologies (MO) and the Apache Lucene (LUC), the curves reflect the expansion result using all the relationships in the ontologies for both fuzzy and crisp ontologies. Thus, the “MO Fuzzy” curve means the information

Relating Ontologies with a Fuzzy Information Model

27

retrieval application based on multi-related ontologies curve, employing fuzzy ontologies, using generalization (G), specialization (S) and positive (P) associations in query expansion. The fuzzy concept network model curve (CN) employs the automatically constructed multi-relationship fuzzy concept network knowledge base.

100 MO Fuzzy Luc Fuzzy MO Crisp Luc Crisp CN

90 80

Precision (%)

70 60 50 40 30 20 10 0 0

10

20

30

40

50

60

70

80

90

100

Recall (%)

Fig. 13 Recall and precision measures using both fuzzy and crisp ontologies.

It can be seen that the proposed model performs better than the others. For fuzzy ontologies, the precision for low recall values is above 95%, and remains above 50% for higher recall values. For crisp ontologies, the precision for low recall values is above 95%, and remains above 43% for higher recall values. The Apache Lucene model exhibits lower precision values. For fuzzy ontologies, the precision for low recall values is above 85%, and remains above 49% for higher recall values. For crisp ontologies, the precision for low recall values is above 80%, and remains above 47% for higher recall values. The fuzzy concept network model exhibits precision above 90% for low recall values, and keeps precision above 27% for higher recall values. The fuzzy concept network model displays lower precision compared with the other models. As its knowledge base is automatically generated, it does not capture the semantic content as does a knowledge base constructed with human knowledge. Despite this fact, results are promising for an automatically generated knowledge base. The weights between concepts in the generated knowledge base are fuzzy, and there is no way to calculate them as crisp. Fig. 14 presents a more detailed vision of performance results for the proposed model and the Apache Lucene using fuzzy ontologies, and Fig. 15 presents the performance results using crisp ontologies. In the graph legends, KW means the use of entered keywords only (i.e., without performing query expansion), G means use of generalization association, S means use of specialization association, and P means use of positive association. The purpose is to find out whether the use of positive association, in addition to specialization and generalization associations, in the query expansion method improves information retrieval performance. In both graphs, as the multi-related ontologies model and the Apache Lucene use the same query expansion method, their performances have the same tendency. When taking

28

Maria Angelica A. Leite, Ivan L. M. Ricarte 100

MO KW MO GS MO GSP LUC KW LUC GS LUC GSP

90

80

Precision (%)

70

60

50

40

30

20

10

0

0

10

20

30

40

50

60

70

80

90

100

Recall (%)

Fig. 14 Recall and precision measures using fuzzy ontologies.

100

MO KW MO GS MO GSP LUC KW LUC GS LUC GSP

90

80

Precision (%)

70

60

50

40

30

20

10

0

0

10

20

30

40

50

60

70

80

90

100

Recall (%)

Fig. 15 Recall and precision measures using crisp ontologies.

into account only user entered keywords, precision for lower recall values is high, but decreases rapidly as the recall values increase. The curve representing the use of Apache Lucene with only keywords (LUC KW) illustrates a non-fuzzy system performance. When taking into account both the specialization (S) and generalization (G) associations, precision values remain higher than when using only keywords, but start decreasing fast when recall values are around 50%. When also taking into account the positive association (P), precision values are high for low recall values, and remain around 50% for fuzzy ontologies, and around 45% for crisp ontologies. This means that the positive association is responsible for bringing more relevant documents to the list top of the answer document set. The knowledge base composed of multi-related ontologies improves the quality of information retrieval, ensuring higher precision values for the same recall measures. Both the proposed model and the Apache Lucene reached better results using fuzzy ontologies than crisp ontologies.

Relating Ontologies with a Fuzzy Information Model

29

7 Conclusions This work presents an approach for knowledge organization employing a knowledge base composed of multi-related domain ontologies. The innovation of this approach is that the ontologies can deal with knowledge of distinct domains. In contrast to other approaches that use a knowledge base composed of a single ontology, the proposed model explores knowledge expressed in multiple ontologies that, in some contexts, can be related to each other by causal, spatial or similarity relationships. It is not necessary for knowledge of ontologies to overlap in order to integrate them. To deal with the uncertainty and vagueness present in knowledge, fuzzy set theory is used to express the relations between concepts within ontologies and between concepts from distinct ontologies. Knowledge organization and its representation as ontologies is a growing area. Many independently developed crisp or fuzzy ontologies representing distinct domains have been proposed recently. The model presented here offers a new way to reuse these ontologies. Instead of developing one large ontology encoding multidisciplinary knowledge, the proposed approach is to encode this knowledge as distinct domain ontologies, and to relate them in a separate, subsequent step. This allows distinct knowledge groups to work independently, as well as the reuse of existent domain ontologies. If the ontologies represent domains that can be related in some context, then these ontologies can be reused. In this case, only the positive fuzzy associations need to be developed. The positive fuzzy associations are represented externally and independently keeping the domain ontologies unchanged. Although the usage example involves just two related ontologies, the model is general, allowing any number of ontologies. This knowledge organization is used in the presented information retrieval model, and a new method to expand the user query, based on this knowledge, is proposed. The evaluation shows that the proposed model achieves better performance when compared with the multirelationship fuzzy concept network information retrieval model. The manually constructed knowledge base offers semantic knowledge that leads to good retrieval performance when compared with a model using a knowledge base constructed by taking into account just syntactic word co-occurrence. There is ongoing work to develop an interactive method to semi-automatically build the ontologies and their relations. Machine learning techniques such as genetic algorithms and clustering techniques are being used to extract and relate concepts from corpora of documents in a given knowledge area. Initial tests show that this can be useful, particularly in establishing fuzzy positive associations between concepts of ontologies. There is an expectation that this semi-automatic strategy will enable the information retrieval model to achieve scalability to deal with larger ontologies. When compared with the Apache Lucene search engine, the proposed model also displayed better results. Some issues to be addressed in future work include scalability, performance, and domain generalization. The integration of more than two ontologies, the integration of ontologies with larger number of concepts, and the integration with larger, well known document collections are still to be explored, as well as the impact of these changes on performance taking into account time and space complexity. Applying the algorithm to other knowledge related domains and their associated document collections is also necessary to ascertain model behavior. For the purpose of allowing reuse of the encoded knowledge base, a question that must be addressed is the use of some formal standard language to represent fuzzy ontologies, on the basis of standard Web Ontology languages such as Fuzzy OWL 2 [4], or Fuzzy RDF. Another point of interest is to take into account the evolution of knowledge over time. In this direction, Straccia et al. [53] present a general framework for representing and reasoning

30

Maria Angelica A. Leite, Ivan L. M. Ricarte

with annotated fuzzy RDF with a term, such as time. The proposed extension requires dealing with quadruples rather than triples, with the additional term having specific semantics and operational behaviour. A complicating factor is polysemy. According to Wordnet [22], polysemy, also known as lexical ambiguity, refers to the ambiguity of an individual word or phrase that can be used (in different contexts) to express two or more different meanings. Words are used as names of entities, and there can be multiple entities with the same name, causing confusion. This problem is often referred to as the name disambiguation problem, and there are proposals to try to solve it [13,38]. The disambiguation problem has to be investigated in this proposed model. Acknowledgements This work received partial financial support from CAPES - Brazil.

References 1. Apache Project: Apache Lucene Overview. Internet page, The Apache Software Foundation (2009). http://lucene.apache.org/java/docs/index.html 2. Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press / Addison-Wesley (1999). URL sunsite.dcc.uchile.cl/irbook/ 3. Bhogal, J., Macfarlane, A., Smith, P.: A review of ontology based query expansion. Information Processing and Management 43(4), 866–886 (2007). DOI http://dx.doi.org/10.1016/j.ipm.2006.09.003 4. Bobillo, F., Straccia, U.: An owl ontology for fuzzy owl 2. In: J. Rauch, Z. Ras, P. Berka, T. Elomaa (eds.) Foundations of Intelligent Systems, Lecture Notes in Computer Science, vol. 5722, pp. 151–160. Springer Berlin / Heidelberg (2009) 5. Broughton, V.: Faceted classification as a basis for knowledge organization in a digital environment; the bliss bibliographic classification as a model for vocabulary management and the creation of multidimensional knowledge structures. New Review of Hypermedia and Multimedia 7(1), 67–102 (2001). DOI http://dx.doi.org/10.1080/13614560108914727. URL http://dx.doi.org/10.1080/13614560108914727 6. Chen, S.M., Horng, Y.J., Lee, C.H.: Fuzzy information retrieval based on multi-relationship fuzzy concept networks. Fuzzy Sets and Systems 140(1), 183–205 (2003) 7. Choi, N., Song, I.Y., Han, H.: A survey on ontology mapping. SIGMOD Rec. 35(3), 34–41 (2006). DOI http://doi.acm.org/10.1145/1168092.1168097 8. Cruz, I.F., Rajendran, A.: Semantic data integration in hierarchical domains. IEEE Intelligent Systems 18(2), 66–73 (2003). DOI http://dx.doi.org/10.1109/MIS.2003.1193659 9. Embrapa : Bases de Dados da Pesquisa Agropecu´aria . Internet page, Empresa Brasileira de Pesquisa Agropecu´aria (2008). http://www.bdpa.cnptia.embrapa.br/ 10. Embrapa: Brazilian Agricultural Research Corporation. Internet page, Embrapa (2009). http://www.embrapa.br/english 11. Ensan, F., Du, W.: A knowledge encapsulation approach to ontology modularization. Knowledge and Information Systems 26, 249–283 (2011). URL http://dx.doi.org/10.1007/s10115-009-0279-y. 10.1007/s10115-009-0279-y 12. Finin, T., Mayfield, J., Joshi, A., Cost, R.S., Fink, C.: Information retrieval and the semantic web. In: HICSS ’05: Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS’05) - Track 4, p. 113.1. IEEE Computer Society, Washington, DC, USA (2005) 13. Fodeh, S., Punch, B., Tan, P.N.: On ontology-driven document clustering using core semantic features. Knowledge and Information Systems pp. 1–27 (2011). URL http://dx.doi.org/10.1007/s10115-0100370-4. 10.1007/s10115-010-0370-4 14. Gomez-P´erez, A., Fern´andez-Lopez, M., Corcho, O.: Ontological Engineering. Springer-Verlag (2003) 15. Hernandez, N., Mothe, J., Poulain, S.: Customizing information access according to domain and task knowledge: the ontoexplo system. In: SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 607–608. ACM Press, New York, NY, USA (2005). DOI http://doi.acm.org/10.1145/1076034.1076151 16. Horng, Y.J., Chen, S.M., Lee, C.H.: Automatically constructing multi-relationship fuzzy concept networks for document retrieval. Applied Artificial Intelligence, 17(1), 303–328 (2003) 17. Jalali, V., Matash Borujerdi, M.: Information retrieval with concept-based pseudo-relevance feedback in medline. Knowledge and Information Systems pp. 1–12 (2010). URL http://dx.doi.org/10.1007/s10115010-0327-7. 10.1007/s10115-010-0327-7

Relating Ontologies with a Fuzzy Information Model

31

18. Jin, W., Srihari, R.K., Ho, H.H., Wu, X.: Improving knowledge discovery in document collections through combining text retrieval and link analysis techniques. In: Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, pp. 193–202. IEEE Computer Society, Washington, DC, USA (2007). DOI 10.1109/ICDM.2007.62. URL http://portal.acm.org/citation.cfm?id=1441428.1442071 19. Jung, J.J.: Taxonomy alignment for interoperability between heterogeneous virtual organizations. Expert Systems with Applications 34(4), 2721–2731 (2008). DOI http://dx.doi.org/10.1016/j.eswa.2007.05.015 20. Kalfoglou, Y., Schorlemmer, M.: Ontology mapping: the state of the art. The Knowledge Engineering Review 18(1), 1–31 (2003) 21. Klein, M.: Combining and relating ontologies: an analysis of problems and solutions. In: A. GomezPerez, M. Gruninger, H. Stuckenschmidt, M. Uschold (eds.) Workshop on Ontologies and Information Sharing, IJCAI’01. Seattle, USA (2001). URL citeseer.ist.psu.edu/klein01combining.html 22. Kolte, S.G., Bhirud, S.G.: Exploiting links in wordnet hierarchy for word sense disambiguation of nouns. In: Proceedings of the International Conference on Advances in Computing, Communication and Control, ICAC3 ’09, pp. 20–25. ACM, New York, NY, USA (2009). DOI http://doi.acm.org/10.1145/1523103.1523108. URL http://doi.acm.org/10.1145/1523103.1523108 23. Lau, R.Y.K., Li, Y., Xu, Y.: Mining fuzzy domain ontology from textual databases. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, pp. 156–162. IEEE Computer Society, Washington, DC, USA (2007). DOI http://dx.doi.org/10.1109/WI.2007.76 24. Leite, M.A., Ricarte, I.L.: A framework for information retrieval based on fuzzy relations and multiple ontologies. In: IBERAMIA ’08: Proceedings of the 11th Ibero-American conference on AI, pp. 292–301. Springer-Verlag, Berlin, Heidelberg (2008) 25. Leite, M.A.A., Ricarte, I.L.M.: Document retrieval using fuzzy related geographic ontologies. In: GIR ’08: Proceeding of the 2nd international workshop on Geographic information retrieval, pp. 47–54. ACM, New York, NY, USA (2008). DOI http://doi.acm.org/10.1145/1460007.1460021 26. Leite, M.A.A., Ricarte, I.L.M.: Fuzzy information retrieval model based on multiple related ontologies. In: 20th IEEE International Conference on Tools with Artificial Intelligence, pp. 309–316. IEEE Computer Society, Washington, DC, USA (2008) 27. Leite, M.A.A., Ricarte, I.L.M.: Multiple ontologies with fuzzy relations. Internet page, School of Electrical and Computer Engineering - University of Campinas - UNICAMP (2011). http://www.dca.fee.unicamp.br/ ricarte/MORFuzz/ 28. Luaces, M.R., Param´a, J.R., Pedreira, O., Seco, D.: An ontology-based index to retrieve documents with geographic information. In: SSDBM ’08: Proceedings of the 20th International Conference on Scientific and Statistical Database Management, pp. 384–400. Springer-Verlag, Berlin, Heidelberg (2008) 29. Luaces, M.R., Parama, J.R., Pedreira, O., Seco, D., Viqueira, J.R.R.: An index structure to retrieve documents with geographic information. In: DEXA ’07: Proceedings of the 18th International Conference on Database and Expert Systems Applications, pp. 64–68. IEEE Computer Society, Washington, DC, USA (2007). DOI http://dx.doi.org/10.1109/DEXA.2007.35 30. Madin, J.S., Bowers, S., Schildhauer, M.P., Jones, M.B.: Advancing ecological research with ontologies. Trends in Ecology and Evolution 23(3), 159–168 (2008) 31. McGuinness, D.L., Chang, C.: Wine Agent 1.0. Internet page, Stanford University (2009). http://onto.stanford.edu:8080/wino/index.jsp 32. McGuinness, D.L., Fikes, R., Rice, J., Wilder, S.: An environment for merging and testing large ontologies. In: Proceedings of the Seventh International Conference on Principles of Knowledge Representation and Reasoning (KR2000), pp. 483–493 (2000) 33. Mothe, J., Chrisment, C., Dousset, B., Alaux, J.: Doccube: multi-dimensional visualisation and exploration of large document sets. Journal of the American Society for Information Science and Technology 54(7), 650–659 (2003). DOI http://dx.doi.org/10.1002/asi.10257 34. Nachtegael, M., Cock, M.D., der Weken, D.V., Kerre, E.E.: Fuzzy relational images in computer science. In: Lecture Notes In Computer Science, vol. 2561, pp. 134–151. Springer-Verlag, London, UK (2002) 35. Noy, N.F.: Semantic integration: a survey of ontology-based approaches. SIGMOD Record 33(4), 65–70 (2004). DOI http://doi.acm.org/10.1145/1041410.1041421 36. Noy, N.F., Musen, M.A.: Prompt: Algorithm and tool for automated ontology merging and alignment. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, pp. 450–455. AAAI Press / The MIT Press (2000) 37. Ogawa, Y., Morita, T., Kobayashi, K.: A fuzzy document retrieval system using the keyword connection matrix and a learning method. Fuzzy Sets and Systems 39(2), 163–179 (1991). DOI http://dx.doi.org/10.1016/0165-0114(91)90210-H 38. On, B.W., Lee, I., Lee, D.: Scalable clustering methods for the name disambiguation problem. Knowledge and Information Systems pp. 1–23 (2011). URL http://dx.doi.org/10.1007/s10115-011-0397-1. 10.1007/s10115-011-0397-1

32

Maria Angelica A. Leite, Ivan L. M. Ricarte

39. Parry, D.: Fuzzy ontologies for information retrieval on the www. In: Elie Sanchez. (Org.). Fuzzy Logic and The Semantic Web, pp. 21–48. Elsevier B. V., Amsterdan (2006) 40. Paz-Trillo, C., Wassermann, R., Braga, P.P.: An information retrieval application using ontologies. Journal of the Brazilian Computer Society pp. 17–31 (2006) 41. Pedrycz, W., Gomide, F.: An introduction to fuzzy sets : Analysis and Design. MIT Press, Cambridge, Massachusetts (1998) 42. Pedrycz, W., Gomide, F.: Fuzzy Systems Engineering: Toward Human–Centric Computing. John Wiley & Sons, Inc (2007) 43. Pereira, R., Ricarte, I., Gomide, F.: Fuzzy relational ontological model in information search systems. In: Elie Sanchez. (Org.). Fuzzy Logic and The Semantic Web, pp. 395–412. Elsevier B. V., Amsterdan (2006) 44. Pinto, H.S., G´omez-P´erez, A., Martins, J.P.: Some issues on ontology integration. In: Proceedings of the IJCAI-99 Workshop on Ontologies and Problem Solving Methods (1999) 45. Plangprasopchok, A., Lerman, K., Getoor, L.: Growing a tree in the forest: constructing folksonomies by integrating structured metadata. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’10, pp. 949–958. ACM, New York, NY, USA (2010). DOI http://doi.acm.org/10.1145/1835804.1835924. URL http://doi.acm.org/10.1145/1835804.1835924 46. Projeto SISGA: Mapa do Clima no Brasil. Internet page, Universidade Regional de Blumenau (2009). http://www2.inf.furb.br/sisga/educacao/ensino/mapaClima.php 47. Quan, T.T., Hui, S.C., Cao, T.H.: Ontology-based fuzzy retrieval for digital library. In: Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers, ICADL’07, pp. 95–98. Springer-Verlag, Berlin, Heidelberg (2007) 48. Shah, U., Finin, T., Joshi, A.: Information retrieval on the semantic web. In: CIKM ’02: Proceedings of the eleventh international conference on Information and knowledge management, pp. 461–468. ACM Press, New York, NY, USA (2002). DOI http://doi.acm.org/10.1145/584792.584868 49. Shehata, S., Karray, F., Kamel, M.: Enhancing text retrieval performance using conceptual ontological graph. In: Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops, pp. 39–44. IEEE Computer Society, Washington, DC, USA (2006). DOI 10.1109/ICDMW.2006.71. URL http://portal.acm.org/citation.cfm?id=1260200.1260425 50. Shehata, S., Karray, F., Kamel, M.: A concept-based model for enhancing text categorization. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’07, pp. 629–637. ACM, New York, NY, USA (2007). DOI http://doi.acm.org/10.1145/1281192.1281260. URL http://doi.acm.org/10.1145/1281192.1281260 51. Silva, N., Rocha, J.: Complex semantic web ontology mapping. Web Intelligence and Agent Systems 1(3,4), 235–248 (2003) 52. de Souza, K.X.S., Davis, J.: Expanding queries in knowledge management systems. In: Radozlaw P. katarzyniak. (Editor). Ontologies and Soft Methods in Knowledge Management, pp. 3–18. Advanced Knowledge International Pty Ltd., Poland (2005) 53. Straccia, U., Lopes, N., Lukacsy, G., Polleres, A.: A general framework for representing and reasoning with annotated semantic web data. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10), pp. 1437–1442. AAAI Press (2010) 54. Tvarozek, M., Bielikova, M.: Personalized faceted navigation for multimedia collections. In: SMAP ’07: Proceedings of the Second International Workshop on Semantic Media Adaptation and Personalization, pp. 104–109. IEEE Computer Society, Washington, DC, USA (2007). DOI http://dx.doi.org/10.1109/SMAP.2007.33 55. Widyantoro, D.H., Yen, J.: A fuzzy ontology-based abstract search engine and its user studies. In: Proceedings of the IEEE International Conference on Fuzzy Systems, pp. 1291–1294. IEEE Computer Society, Washington, DC, USA (2001) 56. Wikip´edia: K¨oppen Climate Classification. Internet page, Wikimedia Foundation (2009). http://en.wikipedia.org/wiki/K¨oppen climate classification 57. Zhang, L., Yu, Y., Zhou, J., Lin, C., Yang, Y.: An enhanced model for searching in semantic portals. In: WWW ’05: Proceedings of the 14th international conference on World Wide Web, pp. 453–462. ACM, New York, NY, USA (2005). DOI http://doi.acm.org/10.1145/1060745.1060812