SEMANTIC ANNOTATION FRAMEWORK FOR WEB RESOURCES Sahar Maâlej Dammak1, Anis Jedidi 2 and Rafik Bouaziz 3 1,2
3
Sfax University, MIRACL Laboratory, Technological pole, ISIMS, Tunis Road, BP 242, 3021 Sfax, Tunisia
[email protected] [email protected]
Sfax University, MIRACL Laboratory, Technological pole, FSEGS, Airport Road, BP 1088, 3018 Sfax, Tunisia
[email protected]
ABSTRACT To solve problems related to the query of Web resources by semantic criteria, we will propose a new approach consisting in the semantic annotation of these resources. Our proposals are intended to improve the semantic Web interrogation process. This work is derived from the analysis of the RDF result for the "Semantic Radar" Plug-in and domain ontologies. Following our annotation process, we will define equivalence rules and a semantic annotation model according to ontologies and the generated "Semantic Radar" result. After that, we will also present in this paper the principal implementation parts of our Framework for semantic annotation.
KEYWORDS Web annotation, Semantic annotation model, Domain ontology, Semantic Radar, Semantic search engine.
1. INTRODUCTION One of the objectives of the Semantic Web (SW) is to describe the Web resources in order to improve the interrogation process. In fact, the Web resources are very heterogeneous in terms of their structure as well as the language used. So, the annotation process automation is required. We will propose in this paper an approach for a semantic annotation of the Web resources, based on the domain ontologies enriched by the descriptors families (FOAF, SIOC, etc.). In addition, we will consider the development of a semantic annotation model which automatically uses the results obtained from the Semantic Radar Plug-in (SR) that is integrated into Mozilla Firefox [2] [10]. In fact, the SR allows us to extract the semantic data that exist on each Web page and its links in a RDF file. We are interested here in the operationalization of this approach in order to automate the annotation process, i.e., in the creation and the instantiation of contextual and semantic relationships between the different Web resources to accelerate the interrogation process. We will present in Section 2 our motivations and related work. In Section 3, we will describe the semantic annotation process of the Web resources. Section 4 will present the proposed semantic annotation model. The semantic search engine approach will be described in Section 5. Finally, in Section 6 we will present our conclusions and perspectives.
2. MOTIVATION AND RELATED WORK Our goal is to accelerate the interrogation process of the Web resources. In fact, creating contextual and semantic relationships between these resources and annotating their semantic content can facilitate this process. In this context, our work is intended to improve the Web
106
research without using any documentary corpus defined in advance. To achieve this, we have studied tools for extracting metadata such as RDFa distiller [8] and Semantic Radar [10]. As these tools may produce errors in the annotation of resources, they do not really indicate the semantics of a Web page and its relations with other pages. In addition, we have benefited from some recent approaches to semantic annotation of the semi-structured documents and the Web ones such as the approaches presented in [1] and [12]. We have also studied systems for automatic generation of semantic descriptions such as WebCat Framework [7] and ALLRIGHT system [11]. Compared to our goal, these approaches and systems are limited to only a part of a Web page (text, images, etc.) to extract the RDF metadata. In addition, they do not address the semantic relationships between the Web resources during the extraction. Based on these studies, we conclude that using ontologies is a very encouraging solution in the annotation of the Web resources. We can then define a new approach to semantic annotation of Web resources taking into account existing links between the Web pages. This annotation will be more relevant if it uses ontology. It can help to solve the problems related to the annotation for Web resources, existing in the literature, and to improve the interrogation process in the SW [6].
3. APPROACH TO SEMANTIC ANNOTATION OF WEB RESOURCES We propose to create a new approach to semantic annotation of the Web resources. Our approach is based on the domain ontologies that have been extended by FOAF, SIOC concepts in the instantiation process. This approach is also based on the semantic structures instantiated by the SR tool in a set of RDF files [6]. The general steps of the proposed approach are as follows (cf. Figure 1): After the specification of the study field, we interrogate the Web through a semantic search engine. This interrogation is based on the concepts of the domain ontology. Then, we intend to define a method for filtering Web pages based on a selected relevance threshold. The Web pages returned after interrogation pass into the filtering process to use the most relevant. Subsequently, we transfer the filtered Web pages to the automatic and semantic analysis by SR. This analysis returns a RDF file for each page containing descriptors such as FOAF, SIOC, etc. The set of RDF files extracted by SR represents the input of the annotation process. We also intend to develop a method for the annotation of the Web pages, which helps to produce RDF metadata for each Web resource and which is based on equivalence rules and on a semantic annotation model that we intend to develop. The new RDF resource will be linked to the original resource then released on the Web.
Figure 1. Steps of our approach
107
4. THE PROPOSED SEMANTIC ANNOTATION MODEL The objective of the paper is to detail the proposed semantic annotation model through an algorithm, and to demonstrate the feasibility of this model in our semantic annotation Framework of Web resources. We see that an equivalence search between concepts of a RDF file (analysis result carried out by the SR Plug-in on Web pages) and concepts of the enriched domain ontology (which present the semantic of the domain) is required in the proposed semantic annotation model. As we mentioned before, the SR Plug-in recognizes all the content of the page and indicates the presence of the following data (RDF data of SW): FOAF, SIOC, DOAP and RDFa [10] (cf. Figure 1). To achieve this equivalence, we need the extended ontologies that add concepts of the standards FOAF, SIOC, DOAP and / or RDFa. If we do not find in the literature an ontology that includes all these standards, we note that the extensions made on the ontologies usually use SIOC and FOAF standards. We choose to use these types of ontologies such as Bio-Zen [9], SWORE [5], etc. We also see that we can use the concepts of the SIOC ontology (SemanticallyInterlinked Online Communities) [14] [3] and the FOAF ontology (Friend-of-a friend) [13] [4] in the annotation of Web resources to further enrich the latter. We need to define equivalence rules between the enriched domain ontologies and the RDF description made by SR Plug-in in order to integrate these rules in the resource description. This equivalence is identified in the relationship between the FOAF concepts of the FOAF ontology and the SIOC concepts of the SIOC ontology [3]. Then, we present only one rule among the four rules that we propose. In fact, these rules follow the same process of annotation, using a relationship between the FOAF concept and the SIOC concept. Rule 1: Algorithmically: If (there is a concept OnlineAccount (FOAF) in RDF of SR) then Annotate the concept: (Ann01) .... / / the result of Semantic Radar End if By the predicate logic: CFSR: OnlineAccount → Ann01 (CFSR-OnlineAccount) Legend: CFSR-OnlineAccount: The FOAF concept of Semantic Radar-OnlineAccount
Explanation: There is a link between the SIOC concept “UserAccount” and the FOAF concept “OnlineAccount” [3]. We then propose to annotate the FOAF concept “OnlineAccount” by the link that relates this concept with another concept (by changing the RDF result of SR after querying). We will follow this approach in the definition of the other rules. We now show the semantic annotation model proposed. We have decided to subdivide it into two sub models.
108
4.1. The Annotation Sub-Model Based On FOAF In case of using this sub-model in a search of equivalence between the result of SR Plug-in and the enriched domain ontology (Onto), we propose to use the FOAF ontology to better enrich the annotation. This sub-model is based on equivalence between all the FOAF concepts of the enriched domain ontology and all the FOAF concepts generated by SR. We describe the submodel proposed, algorithmically. Begin For all FOAF-Concept-Onto For all FOAF-Concept-SR If (FOAF-Concept-Onto = FOAF-Concept-SR) then If (FOAF-Concept-Onto has a ParC) and (FOAF-Concept-Onto hasn't a ChildC) then Annotate the concept: (Ann1) ParC-Concept-FOAF-Onto or concept related to the current concept in the FOAF ontology or This proposed semantic annotation RDF allows us to ... / / the result of SR annotate semantically the FOAF-Concept-SR by the < /FOAF: FOAF-Concept-SR > concepts of the ontology. else If (FOAF-Concept-Onto hasn’t a ParC) and (FOAF-Concept-Onto has a ChildC) then Annotate the concept: (Ann2) ChiC-Concept-FOAF-Onto or concept related to the current concept in the FOAF ontology or ... / / the result of SR < /FOAF: FOAF-Concept-SR > else If (FOAF-Concept-Onto has a ParC) and (FOAF-Concept-Onto has a ChildC) then Annotate the concept: (Ann3) ParC-Concept-FOAF-Onto ChiC-Concept-FOAF-Onto or concept related to the current concept in the FOAF ontology or ... / / the result of SR < /FOAF: FOAF-Concept-SR > End if End if End if else If (FOAF-Concept-Onto exists in the path FOAF-Concept-SR) then If (FOAF-Concept-Onto exists as a son) then Annotate the concept: (Ann4) FOAF-Concept-Onto or concept related to the current concept in the FOAF ontology or ... / / the result of SR < /FOAF: FOAF-Concept-SR >
109
else If (FOAF-Concept-Onto exists as a parent) then If (FOAF-Concept-Onto has a ParC) and [(FOAF Concept-SR, under FOAF-Concept-Onto, hasn’t a ChildC) or (FOAF-Concept-SR doesn’t exist in the ontology)] then Annotate the concept: (Ann5) FOAF-Concept-Onto ParC-Concept-FOAF-Onto or concept related to the current concept in the FOAF ontology or ... / / the result of SR < /FOAF: FOAF-Concept-SR > else If (FOAF-Concept-Onto hasn’t a ParC) and [(FOAF-Concept-SR, under FOAF-Concept-Onto, has a ChildC) then Annotate the concept: (Ann6) < FOAF: FOAF-Concept-SR rdf: about = « URI of the FOAF concept » > FOAF-Concept-Onto ChiC-Concept-FOAF-SR or concept related to the current concept in the FOAF ontology or ... / / the result of SR < /FOAF: FOAF-Concept-Onto > else If (FOAF-Concept-Onto has a ParC) and [(FOAF-Concept-SR, under FOAF-Concept-Onto, has a ChildC) then Annotate the concept: (Ann7) < FOAF: FOAF-Concept-SR rdf: about = « URI of the FOAF concept » > FOAF-Concept-Onto ParC-Concept-FOAF-Onto ChiC-Concept-FOAF-SR or concept related to the current concept in the FOAF ontology or ... / / the result of SR < /FOAF: FOAF-Concept-SR > End if End if End if End if End if else No annotation of the FOAF-Concept-SR by the ontology concepts. End if End if End for End for End Legend: FOAF-Concept-Onto: The FOAF concept of the ontology. FOAF-Concept-SR: The FOAF concept of the SR RDF. ParC : The parent concept. ChildC : The child concept. ParC-Concept-FOAF-Onto: The parent concept of the FOAF concept in the ontology. ChiC-Concept-FOAF-Onto: The child concept of the FOAF concept in the ontology. ChiC-Concept-FOAF-SR: The child concept, in the ontology, of the FOAF concept of the SR.
4.2. The Annotation Sub-Model Based On SIOC In case of using this sub-model in a search of equivalence between the result of SR Plug-in and the enriched domain ontology, we propose to use the SIOC ontology to better enrich this annotation portion. In fact, the second sub-model based on the equivalence between all the
110
SIOC concepts of the enriched domain ontology and all the SIOC concepts generated by SR. In addition, in this annotation, we propose to add the names of various links of the SIOC ontology, as descriptors, which are in relation with the concept to annotate. In other words, the annotation proposed in this sub-model is based, on the one hand, on the descriptors proposed in FOAF annotation and, on the other hand, on , existing in the SIOC ontology in relation to the concept to annotate.
5. APPROACH OF THE NEW SEMANTIC SEARCH ENGINE We have undertaken the development of a prototype, called "Querying Web", using the Plug-in "Semantic Radar" to extract semantic descriptors (Step 3 of our approach), and "Eclipse" as development tool to automate the extraction of RDF metadata (Step 4 of the approach). We have limited our presentation to the principal implementations parts of our Framework for semantic annotation through the following steps, which we have defined to automate the extraction of annotation RDF metadata of Web resources. Figure 2 shows the interface of our semantic search engine.
Figure 2. New semantic search engine - Example of the result of a first Web search
5.1. Writing a Semantic Query As we have shown, Web querying is done through a new semantic search engine (cf. Figure 2). Thus, the domain expert must specify the study domain (right part of the graphical interface "Domain Ontologies"). After the specification of this domain, we show to the user the hierarchy of the ontology to assist him in writing the semantic query. In addition, we tell him to follow a well-defined syntax in the writing of this request in order to have a correct advanced search on the Web (cf. Figure 2). General syntax of search query: “Ontology+Name”+Keyword+OR+Keyword+OR+Keyword etc.
111
5.2. Web Querying According to the study of opportunities of the interrogation of internet through our search engine, we notice the need to use a Web service to communicate with Google. Google Custom Search API9 is an API to retrieve and display results from a Custom Search Engine. With this API, we can use REST requests for the results of Web searches. It limits the number of requests per day: 100 requests/day. Therefore, we can use the Google Custom Search API for querying the Web with Google, from our Java application in Eclipse. In fact, this API replaces an old API 10 called “Google Web Search API ”. This last API has been officially deprecated as of November 1, 2010. Also, the Custom Search API Project and the Custom Search Engine are run for each Web search in the background and transparently to the user.
5.3. Definition of a Search Method After writing the query, the search will be conducted through a connection between Google Custom Search API and Custom Search Engine. As a search method, we propose, at each Web interrogation, the display of the URL returned by Google (10 links) in a table, Table of URL (cf. Figure 2) and also the number of search list (cf. Figure 2). In addition, we import with each search the RDF descriptions of Web pages returned in the list. An internal file will be created automatically "RDF.txt" to be the first source of analysis. This file contains the RDF descriptors extracted by SR for only pages that have a first level of semantic annotation (cf. Figure 3).
Figure 3. Example of a first level of semantic annotation on a Web page
5.4. Generation of Annotation Metadata In this step, we propose to produce the RDF annotation metadata for each Web resource. In fact, the required inputs are ready: the enriched domain ontology (.OWL), the FOAF and SIOC ontologies (.RDF) and the file RDF.txt. The remaining task is to apply equivalence rules and 9
https://developers.google.com/custom-search/ https://developers.google.com/web-search/
10
112
also the proposed semantic annotation model to generate a RDF document related to original resource on the Web. This task is in progress and will be published later. After the proposed annotation of Web resources, a new search returns a new sorting of the result. Indeed, annotated resources will appear at the top of the list. After the implementation of the annotation process, we estimate to achieve the relevant search of the Web resources. But, we plan to test these products on a typical application (study domain) in order to evaluate it through the automation percentages of annotation.
6. CONCLUSIONS AND FURTHER WORK In this paper, we have shown the steps of the approach we have proposed to the semantic annotation for Web resources. This approach helps to query in a SW environment. In addition, we have proposed equivalence rules and a model of annotation used in the search of equivalence between the analysis result of a Web page by SR and the enriched domain ontology. Finally, we have presented some implementation elements of the new semantic search engine. In the future works, we intend to automate the generation of RDF metadata respecting the proposed semantic annotation model. In addition, we will propose assigning weights of annotation in metadata to associate with each line of annotation a weight indicating the degree of annotation. In fact, we seek to create the semantic and fuzzy annotation metadata. Also, we look for a total automation of the semantic and fuzzy annotation of Web resources in all these steps.
REFERENCES [1]
Benyahia, K., Lehireche, A. & Latreche, A. (2009) “Annotation Sémantique De Pages Web”, The 2nd Conférence Internationale sur l'Informatique et ses Applications (CIIA'09), Saida-Algeria.
[2]
Bojars, U., Passant, A., Giasson, F. & Breslin, J. (2007) “An Architecture to Discover and Query Decentralized RDF Data”, 3rd Workshop on Scripting For The Semantic Web, Innsbruck, Austria.
[3]
Bojars, U. & Breslin, J. (2010) “Sioc core ontology specification”. http: //rdfs.org/sioc/spec/.
[4]
Brickley, D. & Miller, L. (2010) “Foaf vocabulary specification”. http: //xmlns.com/foaf/spec/.
[5]
Lohmann, S. & Riechert, T. (2010) “Adding semantics to social software engineering: (re-) using ontologies in a community-oriented requirements engineering environment”, The Software Engineering (Workshops), Vol 160GI, pp 485–494.
[6]
Maâlej, S. (2012) “Annotation sémantique des ressources Web: État de l’art et perspectives de recherche”, Inforsid-2012-Séminaire Doctoral 1, Montpellier-France, pp 591–598.
[7]
Martins, B. & Silva, MJ. (2005) “The WebCAT Framework - Automatic Generation of Meta-Data for Web Resources”, Web Intelligence, Compiegne-France, pp 236-242.
[8]
RDFa D. (2010) “RDFa Distiller”. http://www.w3.org/2007/08/pyRdfa/.
[9]
Samwald, M. & Adlassnig, K. (2008) “The bio-zen plus ontology”, The Journal Applied Ontology, Vol. 3, No. 4, pp 213–217. DOI= http://dl.acm.org/citation.cfm?id=1516155.
[10]
SemanticR. (2009) “Semantic Radar”. https://addons.mozilla.org/en-US/firefox/addon/semanticradar/.
[11]
Shchekotykhin K, M., Jannach, D., Friedrich, G. & Kozeruk, O. (2007) “AllRight: Automatic Ontology Instantiation from Tabular Web Documents”, The 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference, Busan-Korea, pp 466-479.
[12]
Thiam, M. (2010) “Annotation Sémantique de Documents Semi-structurés pour la Recherche d’Information”, Thèse de doctorat, Université Paris Sud-Paris XI.
[13]
W3C. (2009) “Friend-of-a-friend (foaf)”. http ://www.foaf-project.org/.
[14]
W3C. (2007) “Semantically-interlinked online communities (sioc)”. http ://sioc-project.org/.
113