Semi-Automated Mapping From RDB To Ontology Vahid Jalali
Alireza Bagheri
Computer engineering & IT department Amirkabir University of Technology Tehran, Iran
[email protected]
Computer engineering & IT department Amirkabir University of Technology Tehran, Iran
[email protected]
Abstract. In this paper a semi automated approach for mapping entities from relational database to existing ontology classes is presented. We use Wordnet for extracting shared concepts between RDB and ontology. Having extracted similar concepts, data replication from DB to ontology can be accomplished.
I.
INTRODUCTION
The construction of ontology can be a time-consuming process, requiring the services of experts both in ontology engineering and the domain of interest. Whilst this may be acceptable in some high value applications, for widespread adoption some sort of semiautomatic approach to ontology construction will be required [9]. Although many semantic web applications have been crafted in recent years, there is still a lack of semantic data for these applications. Usually one can extract pertinent data from existing relational database and convert them to semantic meta-data representation. There are several approaches introduced for replicating data from RDB to ontology, most of them build the ontology structure from scratch and according to the ER model of original database. In this paper we present how to replicate data from relational database to OWL existing model. We make use of Wordnet [4] for denotation of shared concepts between database and ontology. Having found these concepts, system provides user with most related pairs from database and ontology, and let the user decide about which concepts he/she prefers to match. In addition to explained mechanism in which system interacts with user for mapping the concepts, in our proposed approach user can denote mappings manually like what other systems already provide user with. Actually proposed system in this paper can automate notable portion of work by recommending similar concepts between RDB and existing ontology to user. This paper is organized as follows: Section 2 describes a couple of performed experiences in the field of replicating database legacy to ontology. Section 3 discusses different mapping cases. Section 4 provides a full description of
proposed approach. Section 5 draws some conclusions and gives hints about unresolved problems and future works. II.
PERFORMED ACTIVITIES REVIEW
A.
Database to Semantic Web Mapping using RDF Query Languages In [1], Perez and Conrad have proposed a method for mapping database to ontology using RDF query language. They suppose that there is no pre-existing ontology model and user can build it as he/she desires.
First step in Perez approach is to convert RDB data to a modified version of OWL, named Relational OWL [2]. Despite being processable by any application understanding RDF, the data extracted using Relational OWL still lacks real semantic meaning. Indeed, the information originally stored in relational tables is represented within a table object and not within an appropriate Semantic Web object. Having created the Relational OWL representation of the relational database, the second step including the actual mapping can be performed. The RDF model just created may now be queried with an arbitrary RDF query language. As long as the query language is closed, the resulting query response is again within the Semantic Web, i.e. it is a valid RDF model or graph and may then be processed by other Semantic Web applications using their own built-in functionality for reasoning tasks [5]. Perez proposed mapping approach is illustrated in Fig. 1. B. R2O, an Extensible and Semantically Based Database to ontology Mapping Language As it was described there is no pre-existing ontology model in Perez approach. Yet another system [3] named R2O is designed for mapping RDB to existing ontology model. R2O is database independent so the user has to introduce DB schema in R2O syntax in the first step. Having specified DB schema, one can map similar concepts from RDB to existing ontology model and replicate pertinent data. R2O overall architecture is presented in Fig. 2.
Figure 1. Perez Mapping Process
C. Comparing introduced methods First introduced method has the advantage for users familiar with RDF query languages (especially SPARQL), that they don’t need to learn any new syntax for performing the mapping. In addition, Using SPARQL, user can perform complex mappings between database and ontology. But there are some draw backs with this approach. First of all as it is mentioned before, this method can not be applied to situations where there exists a predefined ontology model and secondly intermediate Relational OWL can cause consistency problems when database data and schema changes frequently, though the latter problem can be solved using a wrapper application for generating Relational OWL on demand. On the other hand second approach requisites learning a new language named R2O, which can be time consuming for user of the system. It also enforces the user to redefine database schema in R2O syntax that is a tedious activity (perhaps this part can be automated in refined versions of R2O). Despite these problems R2O has some prominent advantages. The approach is compatible with semantic web applications with implemented ontology model. It is DBMS independent and here again complex mapping cases can be handled.
Figure 2. R2O Artichecture
For second case, consider a situation in which two tables from the database will be mapped to a class in the ontology. Here records which will be transferred to ontology are generated from the join or union of mentioned tables. It is also possible that some fields of an entity in RDB have no usage in ontology model. In cases like this a projection of the table will be mapped to related class in ontology. The last mapping case which Barrasa introduces is faced when only a limited set of records from a specific table in RDB should be transformed to ontology. Fig. 3 provides a brief illustration of discussed mapping cases. But there are still cases which remain unspoken. For example in second case more than two tables can be mapped to a class of ontology, or may be there are cases imaginable in which a combination of second case with third or fourth case is desired. Our proposed approach mainly addresses four mapping cases introduced at the beginning of this part; it also encompasses other scenarios which are discussed in latter paragraph.
Both introduced methods support mapping more than one concept in database to ontology, and both of them support applying arbitrary functions to values of database columns before mapping them to ontology attributes. III.
MAPPING CHALLENGES
In [3] different mapping cases between RDB and ontology is discussed. Consider a table in relational database which represents an entity same as a class in your ontology, in this case what you need is a direct mapping, which means transforming data from each column of the table to its rival attribute in ontology specified class.
Figure 3. Mapping Cases
words which one of them is a part of the other, e.g. “Bicycle” and “Wheel” is Holonyms. WordNet can return pertinent data related to each of these three groups for an arbitrary input. In fig. 4 semantic hierarchy for terms “Teacher” and “Professor” from WordNet is illustrated. V.
PROPOSED APPROACH
Consider a database containing a table named “Teacher” which is used for storing data about teachers in a specific domain like university and another class named “Instructor” as an ontology class in our semantic web application. A probable mapping from RDB to ontology in this case would be mapping “Teacher” table from RDB to “Instructor” class in ontology. WordNet can be of great use in distinguishing these similarities between database and ontology using synset of each word in mapping process. Actually what we have already discussed in our proposed approach fits into direct mapping which is explained in mapping challenges section. Having distinguished the similarity between two common concepts in RDB and ontology our system provides user with extracted shared concepts and asks user which of those pairs best satisfies user’s needs and domain specification. In the next step columns of selected table will be mapped to attributes of chosen class in ontology. We do not recommend using an automated approach for this part in order to extract similar semantics in columns of the table and attributes of the class, but a manual process of attribute mapping using a textual or graphical environment. Besides user can apply some filters to the values of columns before mapping them to ontology or perform some manipulation on their values too. Figure 4. Semantic Hierarchy of “Teacher” and “Professor”
IV.
A LIITE WORD ABOUT WORDNET
WordNet [3] is a large lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptualsemantic and lexical relations. In our proposed approach these three kinds of relations between different words are of major importance: •
Synonyms
•
Hypernyms
•
Holonyms
As you know synonym words are those who have the same meaning, e.g. teacher and instructor can be considered as synonyms. Hypernym words are those which one of them can be considered a subtype of the other, e.g. “Teacher” and “Person” is Hypernym. At last Holonyms is used for pair of
Having accomplished previous steps the system will automatically generate a SQL command which extracts pertinent data from database and using an Owl manipulation engine like Jena [7], these data will be replicated to pre existing ontology files. Combining SQL and Jena for replicating data from RDB to ontology forms an engine comparable with XSLT [6], which is the major component in our system architecture. By embedding the mapping process in application, user no longer needs to learn new syntax or even use RDF query languages in order to replicate data from database to ontology. Instead our wizard based system generates these scripts in its core without letting user interfere if he does not want to. What is just discussed is related to direct mapping. Filtering, projection and applying desirable functions to column values as it is mentioned can be performed in attribute mapping level, so there is no problem with them, but there is still a couple of cases remained to explain. Till now we have used synsets for distinguishing common concepts in RDB and ontology, but what if the names for related tables and classes are not equal so exactly
that one would be in the synset of the other? Consider a database in which there exists a table named “Teacher” for storing information about people who present courses in a university. There can be an ontology model for a semantic web application in which domain modeler has chosen “Professor” name for a class with similar applicability to “Teacher” table in the other information system. The question is that how can our system distinguish these kinds of similarities? Return to Fig. 4 and have another look at the semantic hierarchy of “Teacher” and “Professor”. As you can see parent of “Teacher” named “Educator” is the grand parent of “Professor”. So we can merge those two hierarchies to form a unified result hierarchy in which those two words have a similar parent. This idea can be used for distinguishing more semantically similarities between arbitrary concepts. We introduce a metric for evaluating similarities between different words according to their distance in their WordNet hierarchy. For ease of use we introduce this metric as the number of edges between two concepts in WordNet Hierarchy. In this approach distance between synonym words will be considered as zero. Users can specify ideal precision in the settings of the system so that it can decide easily to introduce which concepts as similar ones. The last problem to be solved is about mapping two or more tables from database to a class in ontology. In scenarios like this Holonym relation between words is regarded, e.g. academy members can be staff or faculty, so the union result of staff and faculty tables can form academy member class of ontology. VI.
PERFORMANCE OF PROPOSED SYSTEM
If proposed system is just used for finding direct mapping cases, it should analyze similarity between each pair of table and class. Consider a database with ‘N’ tables and ontology with ‘M’ classes. The process of mapping RDB to ontology in this situation will be of the order N * M. The story is different with mapping two or more tables from database to a class in ontology. In this situation mapping analysis would be performed from ontology to database, this means that in each step a class from ontology is chosen and tables in database are checked to know if they can be part of the selected class. As we mentioned mapping multiple tables from database to ontology is performed regarding Holonyms of ontology class, one can extend Holonyms by adding Synonyms of each word in the set to itself. VII. CONCLUSION AND FUTURE WORKS In this paper we have proposed an approach for semi automated mapping from RDB to ontology. Our proposed system can save lots of effort and time for users who want to use their relational model data legacy in semantic web application. We used WordNet for extracting similarities between common concepts in database and ontology, as a step forward instead using WordNet, which is a general
knowledge base, more domain-specific semantic hierarchies can be used for gaining a higher level of precision. Performance in software engineering acts a pivotal role; as a matter of fact there should be more effort dedicated to improving the performance of proposed system. In recent years some activities in the field of mapping RDB to ontology is performed using graph theory [8], proposed approach in this paper can also be mixed with applications which use graph theory as their core component in mapping process. REFERENCES [1]
C. P´erez and S. Conrad, "Database to semantic web mapping using RDF query languages," 25th International Conference on Conceptual Modeling, Tucson, Arizona, November 2006, Springer Verlag. [2] C. P´erez and S. Conrad, "Relational.OWL - A data and schema representation format based on OWL," Second Asia-Pacific Conference on Conceptual Modeling (APCCM2005), volume 43 of CRPIT, pages 89–96, Newcastle, Australia, 2005. [3] J. Barrasa, Ó. Corcho, A. G. Pérez, "R2O, an extensible and semantically based database-to-ontology mapping language,". Ontology Engineering Group, Departamento de Inteligencia Artificial, Facultad de Informática, Universidad Politécnica de Madrid, Spain. [4] WordNet a lexical database for the English language, Cognitive Science Laboratory, Princeton University, http://wordnet.princeton.edu/. [5] E. Prud’hommeaux and A. Seaborne. SPARQL Query Language for RDF.http://www.w3.org/TR/2006/WD-rdf-sparql-query-20060220/, 2006. W3CWorking Draft. [6] XSL Transformations (XSLT). http://www.w3.org/TR/1999/RECxslt-19991116,1999. [7] Jena, A semantic web framework for Java. http://jena.sourceforge.net/, 2006. [8] J. Trinkunas, O. Vasilecas, “Building Ontologies from Relational Databases Using Reverse Engineering Methods,” International Conference on Computer Systems and Technologies CompSysTech’07. [9] J. Davies, R. Studer, P.Warren, “Semantic Web Technologies Trends and Research in Ontology-based Systems,” John wiley & Sons, Ltd, 2006. [10] C. Bizer. “D2R MAP-A Database to RDF Mapping Language”. In WWW2003, The Twelfth International World Wide Web Conference, Budapest, Hungary, 2003. poster presentation.