Towards Integration Rules of Mapping from Relational ... - IEEE Xplore

0 downloads 0 Views 264KB Size Report
relationships, composite-column, single or multiple primary key, foreign key, constraints, check between, check like, etc, and then using one of these cases to ...
2010 International Conference on Web Information Systems and Mining

Towards Integration Rules of Mapping from Relational Databases to Semantic Web Ontology Hazber Mohamed,Yang Jincai,Jin Qian Computer Science and Technology Department Huazhong Normal University Wuhan, China [email protected], [email protected] This paper is organized as follows: Section 2 introduces some related works. Section 3 outlines about the semantic web ontology language concepts. Sections 4 and 5 presents our approach of mapping SQL DDL, schema and data rows to semantic web ontology. Finally Section 6 gives a conclusion and point out future works.

Abstract— This paper points towards the integration rules of mapping from relational database such as SQL DDL, schema and data into the semantic web ontology such as OWL which build on top of RDF using vocabulary RDFS and XML schema data type. From the study, different cases are considered: such as tables, relationships, composite-column, single or multiple primary key, foreign key, constraints, check between, check like, etc, and then using one of these cases to mapping data rows to ontology instances. This approach reduces the cost and time for building ontology (OWL).

II.

As the ontology plays an important role in data integration, it makes it necessary to automatically transform relational database to ontology. A number of researchers have made contribution to this problem, such as providing methods and tools for mapping relational database into RDF model [9], and mapping between RDB and ontology [16], which proposed mapping rules between relational database schema and OWL ontology for deep annotation. Ontological annotation is used for dynamic web page contents extracted from the database. Their Framework, DPAnnotator, translate the ER Schema of the RDB into OWL ontology. They provide a D2OMapper tool, which automatically creates the mappings by following their rules. Also, most existing research in semantic extraction is focused on how to directly extract ontology from specific schemata[10] and [15]. For example, [10] gives an approach for automatically formulating OWL ontology from data in relational databases using a set of learning rules, and [15] describes groups of semantic mapping rules for extracting a global OWL ontology from relational database. There are several related works conducted in the transformation of RDB to ontology [11], [12], and [13]. Yet, the approaches suffer from at least one of the following: x These approaches may be not be used to describe the ontology from RDB directly and correctly. x Transformation structure is so simple. E.g. primary keys (PK) and foreign keys (FK) are assumed to be single-column. And a simple relationship assumed as one: one. x Some papers ignore the constraints, and some others assumed simple constraints. x They are not implemented. x Data integration scenario is too complex or require more flexible than the existing approaches enable. x They transform only the structure and not the data. Our approach provides new rules of mapping relational database to semantic web ontology directly and automatically.

Keywords- Relational Databases;RDF; RDFS;OWL; Semantic Web Ontology

I.

INTRODUCTION

Semantic Web is an extension of the “existing” Web, which provides a common framework to make web contents machineunderstandable. This allows data to be shared and reused across application, enterprise. In semantic web, information is given a well-defined meaning representing in languages like RDF and OWL and linking to commonly accepted ontologies, which represents a key component to realize the vision of the semantic web, and it gives more structure and computerunderstandable meaning to the data on the WWW, better enabling computers and people to work in cooperation [1]. The W3C has recommended several formats for representing data in the semantic web including RDF [2], RDF Schema [3] and OWL [4]. These languages have been developed to represent machine understandable semantic to facilitate more intelligent ways of information processing. Furthermore, today a bulk of web content is stored in RDB. Capacity to publish it in the semantic web is significant not only for development of the latter, but also allows search engines to return more relevant Deep Web search results [5]. Success of the semantic web also depends on mass creation of semantic data. The source of data for a web page is generally a relational database; most of the data used in generating web pages is extracted from databases [6]. Therefore, organizationally integrating a database with the semantic web is hardly ever a priority. Semantic web application requires that ontologies exist for all data over which they operate, which provide a way to make the data readable and understandable by the machines, whereby it improves system interoperation and knowledge share [7]. So the ontology plays an important role in data integration and using it for the description and retrieval of data in RDB. Especially ontologies played an important role for heterogeneous database integration [8], which is represented by the ontology and can merge together efficiently. 978-0-7695-4224-9/10 $26.00 © 2010 IEEE DOI 10.1109/WISM.2010.21

RELATED WORK

III.

SEMANTIC WEB ONTOLOGY LANGUAGE CONCEPTS

Semantic web [14] stack includes the standard of XML, XMLS, RDF, RDFS and OWL. OWL adds data modeling 335

semantics that are more powerful than conventional databases. The ontology languages are RDF, RDFS, and OWL, they have XML serialization.

RDFS

A. Resource Description Framework “RDF” RDF is a standard language which represents subject “S”, predicate”P”, and object”O” for expressing data model that represent nodes and edges as URI and literals such as a string or a number. It is the first proper layer of the semantic web. RDF is a simple metadata representation framework, using URIs to identify web-based resources and a graph model for describing relationships between resources. In figure.1 each arc “edges” in RDF Model is called statement or T(triples)={S,P,O}. Each triple asserts a fact about the resource. The subject is the resource from which the arc leaves, predicate is the property that labels the arc, and the object is the resource, BN or literal pointed to by the arc.

owl:Class owl:ObjectProperty owl:cardinality owl:ComplementOf owl:DatatypeProperty owl:minCardinality owl:DeprecatedClass owl:FunctionalProperty owl:maxCardinality owl:DisjointWith owl:TransitiveProperty owl:AllValueFrom owl:EquivalentClass owl:SymmetricProperty owl:hasValue owl:IntersectionOf inverseFunctionalProperty owl:onProperty owl:one of owl:DataRange owl:Restriction owl:union owl:InverseOf SomeValueFrom Other owl:SameAs,owl:differentFrom,,owl:AllDifferent, owl:DistinctMember,owl:equivalentProperty

rdfs:Class, rdfs:subClassOf, rdfs:subProperty, rdfs:range, rdfs:Individual, rdfs:label, rdfs:commnt OWL

ClassRelatedConstructs

RDF Triple

Predicate (Attribute of a URI)

Subject

Literals Type

Object

Type

1-String 2-Integer 3-... 3-etc

Blank Node(BN)

IV.

O

URI

P

Attribute rowID

xsd:string

Value

RULES OF MAPPING SQL DDL AND SCHEMA TO SEMANTIC WEB ONTOLOGY

A. Mapping Tables Each table in a relational database should be mapped to a class in the ontology with the same name corresponding to the table, and comment of table be transformed into comment of class, respectively. Case 1: Assuming that we create SQL DDL Syntax

S

Resource

In database terms

restricted Cardinality

The mapping process is done progressively based on the following rules:

St=Mohammed is From Yemen, his Age is 30 P

Property Characteristics

rdfs:domain,

xsd:int

Create Table Table_1 (T1_Id varchar2 (6) Primary key, T1_F2 integer);

Figure.1 RDF Triple and RDF Graph

The Table Table_1 mapped to class as the name of class.

B. Resource Description Framework Schema “RDFS” RDF Schema defines a number of classes and properties that have specific semantics. A class is a set of resources, and corresponds to the notions of type or category in other representations. The most important concepts of RDF and RDFS are shown in table 1.

comment of Table_1

Case 2: Assuming that we create table SQL DDL Create Table Table_2(T2_Id varchar2(6) primary key); Create Table Table_3(T1_Id varchar2(6) References Table_1,T2_Id varchar2(6) References Table_2,Primary key(T1_Id,T2_Id)); next figure.2 shows schema of relationship.

TABLE 1. Concept of RDF/RDFS Subject rdfs:Resource rdfs:Class rdf:Property rdf:type rdfs:label rdfs:comment

Predicate rdf:type rdf:type rdf:type rdf:type rdf:type rdf:type

Object rdfs:Class rdfs:Class rdfs:Class rdf:Property rdf:Property rdf:Property

PK

Simple says Resource is a type of Class Class is a type of Class Property is a type of Class type is a type of Property label of Property comment of Property

Figure. 2 depicts that Table_3 is relationship with Table_1 and Table_2

In figure.2 of course, we cannot have m:n relationship in RDBs directly, but it can be implemented through adding a third table like (Table_3) that breaks m:n between Table_1 and Table_2 into (at least) two 1: n relationships. So, Table_3 have PK is composed of two fields FK. Foreign keys refer to two other tables (Table_1, Table_2) indicating a binary relationship (m:n). Since FK1 (T1_Id) and FK2 (T2_Id2) are part of the PK (T1_Id, T2_Id), it is mapped to two ObjectProperty.T1_Id (that uses classes Table_2 (destination) and Table_1 (source) as its domain and range, respectively) and T2_Id is inverseOf T1_Id, meaning that the relationship is bidirectional.

RDFS is an ontology language written in RDF that supports the concepts of: resource and properties, sub/super-classing, instantiation, and inheritance. C. Web ontology language “OWL” Ontology is very useful for knowledge representation. Ontology is defined as “a formal, explicit specification of a shared conceptualization”, and it encompasses the following concepts: classes, relationships of classes, property of classes, constraints on relationships between the classes and properties of the classes. OWL is a semantic markup language for publishing and sharing ontologies on the WWW, and it is intended to provide a language that can be used to describe the classes and relations between them. The most important RDF, RDFS, and OWL elements are shown in Table 2.



B. Mapping Table Columns (attributes) Case 1: Simple Column A column in relational schema,which is neither PK nor FK, could be mapped to DatatypeProperty with same name corresponding to the column accompanied by a maximum cardinality with the value 1 in the ontology unless is a FK, rdfs:

TABLE 2. The important elements of the RDF /RDFS/OWL RDF

FK

rdf:type, rdf:datatype, rdf:Property, rdf:resource, rdf:parsType, rdf:ID, rdf:first, rdf:rest, rdf:List, rdf:Description, rdf:value

336

label corresponds to the column name, and rdfs:comment corresponds to the column description. And the domain is the class created by this table and the range of a datatype property is the xsd schema data type equivalent to the data type of its original column in the database. Create Table Table_NO_FK (T_Id varchar2 (6)). Let us have description in column T_Id="using to save ID of employ”.

D. Mapping Constraints SQL supports constraints NOT NULL, UNIQUE, PRIMARY KEY single or multiple columns, REFERENCES, many cases of FOREIGN KEY, CHECK on single or multiple columns, CHECK BETWEEN, CHECK Like, etc. Case 1: NOT NULL Constraints In a relational table, NOT NULL is a column constraint is mapped to a minCardinality of 1.

T_ID (or Emp_ID as caption of column) use to save No. of employ

Create Table Table_Constraint_NOTNULL (T_Id Integer NOT NULL);

A constraint NOT NULL specifies that a column T_Id in a table Table_Constraint_NOTNULL is not null. Therefore, this constraint is mapped to a minimum cardinality of 1.

Case 2: Composite Column A composite column consists of a group of values from more than one domain. For example, assuming we have the following table: Employee (name, address); where the address is composite column (city text, country text); There are two ways to map composite column to OWL datatype property. First is to map only their simple component columns (address) to datatype properties, and ignore composite column (address) itself.

Second is to map composite column to datatype property and then map its simple component columns to subProperty of corresponding datatype property.

……..

………..

Case 2: UNIQUE Constraints The UNIQUE column constraint is mapped to an inverse functional property in ontology. Create Table Table_Constraint_Unique (T_Id Integer UNIQUE);

A constraint UNIQUE in table Table_Constraint_Unique specifies that a column T_Id unique. Therefore, this constraint is mapped to an inverse functional property.

Case 3: PRIMARY KEY (PK) Constraints A PK combines a UNIQUE and a NOT NULL constraint. There are two forms of constraint PK: using it to refer to a single column and using it to refer to multiple columns. A column constraint PK is mapped to both an inverse functional property and a minimum cardinality of 1. Create Table Table_Constraint_PrimaryKey(T_Id Integer Primary Key);

C. Mapping Data Type The majority of mapping of columns has to do with the mapping of data types from SQL to XSD. Here, we would offer most type of data used for mapping from SQL to XSD as shown in table 3.

Case 4: FOREIGN KEY (FK) and REFERENCES FK is a column or columns that relate two tables. A REFERENCE is a column constraint, whereas FK is a table constraint. Both constraints are used for specifying foreign keys. A FK can be mapped to four different constructs in the ontology: an object Property, class inheritance, a SymmetricProperty and a TransitiveProperty. Case 4.1: Constraint with REFERENCES Column

TABLE 3. XSD data type using for mapping from SQL Type Byte

Boolean XML Char

SQL DDL Data Type XSD BIT VARYING, Tinyint xsd:Byte,xsd:unsignedByte BIT, Boolean xsd:Boolean XML xsd:anyType CHAR,VCHAR, VARCHAR,NCHAR, xsd:String Text NVARCHAR,LONGTEXT

Binary Numeric

FLOAT,REAL

Date/ Time Part of date/time

Create Table Table_ReferencesDataSource(T_RDS_Id varchar(6) Primary key,T_RDS_Name varchar(50),post Bit); Create Table Table_References(T_R_Id Integer Primary Key,T_RDS_Id varchar(6) References Table_ReferencesDataSource); next figure.3 shows schema.

Varbinary,Binary,BLOB,IMAGE,LON xsd:hexBinary GBLOB,MEDIUMBLOB,TINYBLOB INTEGER>>xsd:Integer,xsd:positiveInteger,xsd:ngativeIn teger,xsd:nonPositiveInteger,xsd:nonNegativeInteger ,xsd:unsignedInt,xsd:unsignedLong. xsd:int, xsd:long SMALLINT,TINYINT xsd:Short, xsd:unsignedShort xsd:Float

INTERVAL

References Column

xsd:Duration

NUMERIC,DECIMAL,MONY xsd:Decimal DOUBLE PRECISION xsd:Double DATE xsd:Date TIMESTAMP xsd:Datetime TIME,TIME WITH TIME ZONE xsd:Time TIMESTAMP WITH TIME xsd:Datetime xsd:gYear,xsd:gMonth,xsd:gDay,xsd:gYearMonth, xsd:gMonthDay

Figure. 3 constraint referencesc coloumn

337

class inheritance: Dormitory_foreign_students is a subclass of Foreign_students. Case 4.6: Foreign Key as Unary (Recursive) Relation is mapped to Symmetric Property

In figure.3 a constraint REFERENCES specifies that a column T_RDS_Id in a table Table_References is a FK to another table Table_ReferencesDataSource, indicating a binary (one-to zero-or-one, one-to-one or many-to-one) relationship. Since the FK is not part of the primary key, it is mapped to an object property T_RDS_Id that uses classes Table_References, and Table_ReferencesDataSource as its domain and range, respectively. Case 4.2: Foreign Key is mapped to Object Property and allValuesFrom Let this property is restricted to all values from the class Table_ReferencesDataSource, because the FK implies that for each (non-null) value of the column T_RDS_Id there is the same value in the table Table_ReferencesDataSource.

Create Table Accounts_of_company(Account_No varchar(20) Primary key,Account_no_parent varchar(20) References Accounts_of_company);

A constraint REFERENCES in Accounts_Of_company specifies that a column Account_no_parent is a FK to the same table, indicating a unary relationship. Therefore, the FK is mapped to a symmetric property Account_no_parent that uses a class Accounts_of_company as both its domain and range. Case 4.7: Foreign Key as DELETE CASCADE is mapped to TransitiveProperty



Create Table Accounts_of_company(Account_No varchar(20) Primary key, Account_no_parent varchar(20) References Accounts_of_company ON DELETE CASCADE);

Case 4.3: Foreign Key is mapped to Object Property with minCardinality of 1 Let us alter table Table_References by adding not null for column T_RDS_Id, then the FK is now not null, it is mapped to an object property T_RDS_Id with a minCardinality of 1.

A constraint REFERENCES specifies that a column Account_no_parent is a FK to the same table, indicating a unary relationship again. However, since the FK is now accompanied by a trigger ON DELETE CASCADE, this relationship consists of a whole and a part, where the part cannot exist without the whole. Therefore, the FK is mapped to a transitive property Account_no_parent that uses a class Accounts_of_company as both its domain and range. Case 5: CHECK Constraints The CHECK constraint is used to limit the value range that can be placed in a column. Case 5.1: Constraint CHECK on single column



Case 4.4: Foreign Key is mapped to Object Property with cardinality of 1 Let us alter table Table_References by adding primary key(T_R_Id,T_RDS_Id). A constraint REFERENCES in table Table_References specifies that a column T_RDS_Id is a FK to another table Table_ReferencesDataSource, indicating a binary relationship, again. However, since the FK is now part of the PK, it is mapped to an object property T_RDS_Id with a cardinality of 1.

Create Table Student_master_degree(Previous_level Varchar(5) Check (Previous_level='BA'));

A constraint CHECK specifies that all rows in Student_master_degree (only students who have a bachelor degree) have a value BA for a column Previous_level. Therefore, a data type property type is restricted to have the same value for all instances of a class Student_master_degree. BA

Case 4.5: Foreign Key is mapped to Class inheritance Create table Foreign_students(ID_NO integer Primary key ,Name varchar(50)); Create table Dormitory_foreign_students(ID_NO integer primary key ,Foreign Key(ID_NO) References Foreign_students);

Case 5.2: Constraint CHECK on multiple column Create Table Students_master_degree(St_Id Integer,Previous_level varchar(5),Average_score Real, Check(Previous_level='BA' and Average_score >84));

A constraint CHECK specifies that all rows in table a Students_master_degree (only students who have a bachelor degree and average score integers greater than 84%”at least 85 %”). Therefore, the map as follow. BA

A constraint FK here specifies that a column ID_NO in a table Dormitory_foreign_students is a FK to another table Foreign_students, indicating a binary relationship (one-to-one) again. However, since the FK is now the PK, it is mapped to

338

VI.

85 .

This paper presents a new approach for transforming the relation database DDL to semantic web ontology built in OWL on top of RDF using vocabulary RDFS and XSD. We choose DDL because of its expressivity and availability. In this paper different cases are considered such as tables, relationships, single-column, composite-column, data type, and constraints. They are mapped into the ontology, and data rows are mapped into ontologies instance. These approaches will help software engineers to develop their software for semantic web with relational databases rapidly. This research work is open to be expanded in mapping other cases of integration relational databases to semantic web.

Case 5.3: Constraint CHECK Between Create Table Check_between(age Integer, Check(age between 20 and 40));

A constraint CHECK in table Check_between specifies that all rows in Check_between must have age at least 20 years and at most 40 years. Therefore, the map as follow. 20 40 .

Case 5.4: Constraint CHECK with Enumeration

REFERENCES

Create Table Account_of_company(Acc_currency varchar(5) Check( Acc_currency In('USD','RMB')));

[1]

A constraint CHECK specifies the range for a column Acc_currency through a list of values ‘USD’,’RMB’. Therefore, this constraint is mapped to an enumerated data type with one element for each value in the list.

[2]

[3]

USD RMB

[4]

[5]

Case 5.5: Constraint CHECK Like Create Table Id_Card(card_name CHECK (Nationality LIKE 'Y% '));

[6]

varchar(30),Nationality

char(25),

The table Id_Card has a CHECK restriction in which the Nationality must begin with the ‘Y’ letter. This restriction can be mapped to the beginning_with_Y class which denotes the string values which begin with the ‘Y’ letter.

[7]

[8]



[9]

[10]

V.

MAPPING DATA ROWS TO ONTOLOGIES INSTANCE

Applying the above rules of previous cases, an ontological structure can be extracted from a relation schema (tables). The values of the row in table can be mapped to the values of the corresponding property of ontological instance. Assuming we want to add a row in the table Student_master_degree the SQL Syntax: INSERT INTO Student_master_degree('BA'). A row in a table Student_master_degree has a value BA for a column Previous_level. Therefore, this row is mapped to an (anonymous) instance of a class Student_master_degree that has the same value for a data type property type, as illustrated in figure. 4.

[11]

[12]

[13]

[14]

Ontology PrfDB:Student_Master_Degree

[15] rdfs:type rdfs:subClassOf

owl:Class

gerud:A134368

rdfs:type

Owl:Restriction

owl:onProperty PrfDB:Previous_Level

Instance

< Student_Master_Degree> BA

CONCLUSION AND FUTURE WORK

[16]

owl:hasValue

BA

T. Berners-Lee, J. Hendler, O. Lassial, “The Semantic Web”, Scientific American, May 2001, vol.2,pp. 34–43. G. Klyne and J.J. Carroll (2004). "Resource Description Framework (RDF): Concepts and Abstract Syntax." W3C. from http://www.w3.org/TR/rdf-concepts/. D. Brickley and R.V. Guha (2004,10 February 2004). "RDF Vocabulary Description Language 1.0: RDF Schema." W3C. from http://www.w3.org/TR/2004/REC-rdf-schema-20040210/. Bernardo Cuenca Grau, Boris Motik, Zhe Wu, Achille Fokoue, Carsten Lutz, OWL 2 Web Ontology Language:Profiles W3C Working Draft 11 April 2008, http://www.w3.org/TR/2008/WD-owl2-profiles-20080411/. Geller, J., Chun, S.A., An, Y.J,”Towards the semantic deep web”, IEEE Computer ,vol.41,pp. 95–97 ,Sept.2008. LEON TAMBULEA AND ANDREEA SABAU ,” RELATIONAL DATABASES AND RESOURCE DESCRIPTION FRAMEWORK” , STUDIA UNIV. BABES-BOLYAI, INFORMATICA, vol.2, Aug.2009. Hai Zhuge,Yunpeng Xing, Peng Shi, “Resource space model, OWL and database: Mapping and integration” , ACM Transactions on Internet Technology, vol. 8, Sep.2008. O. Vasilecas, D. Bugaite, J. Trinkunas,”On Approach for Enterprise Ontology Transformation into Conceptual Model”,International Conference on Computer Systems and Technologies, pp. 185-207, 2006. S. S.Sahoo, W. Halb, S. Hellmann .” A Survey of Current Approaches for Mapping of Relational Databases to RDF”,W3C RDB2RDF Incubator Group,Jan.2009. M. Li, X. Du, S. Wang, “Learning Ontology from Relational Database”, 4th International conference on Machine Learning and Cybernetics, vol.6, pp. 3410-3415, Aug.2005. Barrasa J., Corcho O., Gómez-Pérez A,”R2O, an xtensible and Semantically Based Database-to-Ontology Mapping Language”, Second Workshop on Semantic Web and Databases ,pp. 1069--1070 ,2004. Astrova I, Kalja A,”Towards the Semantic Web: Extracting OWL Ontologies from SQL Relational Schemata”,Proceedings of IADIS International Conference WWW/Internet ,pp.62-66,2006. Buccella, A., Penabad, M., Rodriguez, F., Farina, A.,Cechich, A,” From Relational Databases to OWL Ontologies”, In: Proceedings of the 6th National Russian Research Conference,2004 Berners-Lee, T. (2000). Semantic Web - XML2000, slide 10. Retrieved April 22, 2009, from http://www.w3.org/2000/Talks/1206-xml2ktbl/slide10-0.html. G. shen,Z. Huang, X. Zhu, X.Zhao, “Research on the Rules of Mapping from Relational Model to OWL”, workshop on OWL:Experieences and Direction, vol. 216,2006. Z. Xu, S. Zhang, Y. Dong,”Mapping between relational database schema and OWL ontology for deep annotation”,In WI'06, IEEE/WIC/ACM International Conference on Web Intelligence,DOI: 10.1109/WI.2006.114,pp. 548-552,Dec.2006.

Figure. 4 Ontology extracted from RDB in Case 5.1(Check single value) and instance

339

Suggest Documents