2007 International Conference on Computational Intelligence and Security Workshops
Using Relational Database to Build OWL Ontology from XML Data Sources Jiuyun Xu, Weichong Li School of Computer & Communication Engineering, China University of Petroleum
[email protected];
[email protected] restriction of the entity-relation model to build an OWL ontology.
Abstract The semantic web and web service take ontology into usage to describe the important concepts and relations among them. But the construction of ontology from scratch is costly and difficult. In this paper an approach is proposed to construct OWL ontology from XML document with the help of entity-relation model, and this approach will alleviate the difficulties in ontology construction.
2. Related work For a survey, two basic approaches [1] can be adopted on the topic. One is top-down approach. In this method, an ontology is previously defined and then associated to the local schema or XML document instance. Klein, M [2] and Gerald Reif et al. [3] both belong to it. They propose a mapping from XML contents to RDF meta-data by using a given ontology. The other is the bottom-up approach, in which an ontology is constructed from the conceptual schema of local data source and all semantics of the ontology derived from the local data sources. There are some related work that analyzes the structure of an XML document to access semantic of the content in second way. Some of them focused on a general mapping between XML and RDF and others mainly aim at mapping from DTD [4] or XML Schema to OWL without enough considering XML instance data. Sergej Melnik [5] believes that every XML document has an RDF model. He proposes building upon a simplified syntax to detect semantics in XML instance documents and to map them to RDF documents. Matthias Ferdinand et al [6] describe mappings from XML Schema to OWL as well as XML to RDF, but the two mappings process have no direct relation. And the approach can not tackle XML document instance to create OWL ontology if no XML Schema is available. Hannes Bohring et al [7] propose a mechanism of creating an XML Schema from XML document when there is no a suitable XML Schema available. Then, the XML Schema is mapped to an OWL ontology. In the past [8]we have proposed an approach to transform plain XML document to OWL ontology in one step as well as an OWL instance is generated simultaneously. We use the metadata in the same level of the XML tree model to build the OWL ontology,
1. Introduction The Semantic Web is the current trend in the development of the World Wide Web. As for Semantic Web, the web content of the Semantic Web will not only be suitable for human consumption but also for machine-accessible. As the basis of Semantic Web, XML has become matured far from the early-adopted phase. However, XML itself only provides syntax and little meanings of XML document content. The tags in XML documents are only meaningful to human, but meaningless to machine. So humans can easily understand the information underlying in an XML document, but machine can’t process effectively. The Semantic web takes ontology as the way to express the semantics of the data. Ontology is defined as an explicit, formal specification of a shared conceptualization of an domain. An ontology not only can provide a standard vocabulary for a problem domain, but also contains structures or axioms that define the semantics of the vocabulary terms. Unfortunately, in the real world, knowledge doesn’t exist in an ontology style. So how to construct a domain ontology is interesting. Ontology construction is a very expensive, time-consuming and laborious issue. In this paper, an approach is proposed to build OWL ontology from XML document. We first map an XML document to an entity-relation model, and then extract the metadata information and structural
0-7695-3073-7/07 $25.00 © 2007 IEEE DOI 10.1109/CIS.Workshops.2007.139
124
and the OWL instance is generated according to the OWL ontology and XML document instance.
2.
semantic relationships between vocabularies such as: rdb:hasRelation, rdb:hasAttribute, rdb:primaryKey, rdb:hasType, rdb:isNullable and so on. 3. restrictions on the vocabularies and their semantic relationships such as: each relation has zero or more attributes, each attribute has exactly one type, etc. The RTO mapping approach is described as below: 1. Each table is mapped to an instance of type rdb:Relation and then added to type rdb:RelationList. 2. Each attribute is mapped to an instance of type rdb:Attribute, and an instance of type rdb:hasType is generated simultaneously. If the attribute is the foreign key, an instance of type rdb:ReferenceAttribute and an instance of type rdb:ReferenceRelation are generated to represent this information. Generate the restriction of each instance of type rdb:Attribute, such as cardinality restriction and foreign key restriction, etc. In fact, we use foreign key restriction to represent the subclass property in the XML document.
3. The XTR-RTO Mapping In this section we propose one XTR (XML Transform to Relational database) mapping approach to map an XML document to an entity-relation model, and then one RTO (Relational database Transform to Ontology ) mapping approach to map an entity-relation model to an OWL ontology.
3.1. XTR Description In the XTR mapping algorithm, we take the concept of XML Schema on defining the element. The data model of XML can be described in a node labeled tree. If an element in the source XML tree is always a leaf, containing only a literal and no attributes, this element is called a simpleType. And an element that has subelements or attributes is called complexType[8]. To make clear which classes and properties are defined in the ontology that describes the XML document, the mapping algorithm is described in detail below: 1. Each SimpleType element and attribute is mapped to a scalar type, which will be used as a column of the table, and their values cannot be changed. 2. Each ComplexType element is mapped to a class, which will be used as a table. The attributes and subelements of them will be mapped to the properties of the class. 3. Each class will be transformed into a table, and its properties will be transformed into a column of the table. It is concretely described below: 1) The name of the table is the same as the class, and also the name of scalar property is used as the name of a column. Add the primary key to each table.
4. Case Study To illustrate how our approach can be used to transform XML data into entity-relation model and then OWL instance, we will now give an example. A part of XML sample might look like this: BK-001 An XML primer John GB2000 XianDai Beijing
3.2. RTO Description The entity-relation model is the most popular style for organizing database at present, which can express the relationship between data clearly. So we can extract metadata information from relational database to construct OWL ontologies. The OWL ontology contains: 1. vocabularies for describing relational database systems such as: rdb:DBName, rdb:Relation, rdb:RelationList, rdb:Table, rdb:Attribute, rdb:PrimaryKeyAttribute, rdb:ForeignKeyAttribute and so on.
This fragment contains the following elements: BOOK, BOOK_ID, TITLE, AUTHOR, PRINTER, PRINTER_ID, PRINTER_NAME, CITY. Among these elements, BOOK and PRINTER are complexType, while others are all simpleType. According to our XTR mapping algorithm, we will get the entity-relation model as below: Class BOOK{ BOOK_ID string;
125
TITLE string; AUTHOR string; PRINTER_ID string; } Class PRINTER { PRINTER_ID string; PRINTER_NAME string; CITY string; }
rdf:resource="http://localhost/book/BOOK/BOOK.owl #BOOK_ID"/> 6 7 8 9 10 11 12 13 false 14 string 15 16 17 false 18 string 19 20 21 false 22 string 23 24 25 false 26 string 27 28 29 30 31 32 33 36...
According to the XTR mapping, there will be two tables in this database, as illustrated in figure1: Table 1. BOOK: BOOK_ID
TITLE
AUTHOR
PRINTER_ID
Table 2. PRINTER: PRINTER_ID
PRINTER_NAME
CITY
Fig.1 The tables in the entity-relation model generated from book.xml Suppose our database is saved in local host, and the OWL ontology describing the database has a namespace “http://localhost/book.owl”, which can be changed by users. The following segment describes the relations in the ontology: 1 2