Demonstrating KDBMS: A Knowledge-based Database Management System Mohamed E. Khalefa
Sameh S. El-Atawy
Faculty of Engineering, Alexandria University Alexandria, Egypt
Accorpa, LLC Alexandria, Egypt
[email protected]
[email protected]
ABSTRACT We demonstrate a KDBMS, a prototype system which seamlessly integrates Knowledge base and DBMS. While state-ofthe-art approaches, i.e., Ontology-based data access, denoted as OBDA , use ontologies to only query data stored in relational databases using SPARQL. In this demo, we present a high level description of the proposed system, introduce a new knowledge-based query language, denoted as KQL, and highlight some query optimization opportunities by employing knowledge across database layers in query optimization, and query processing, while ease the administrating for a complex database schema.
1.
INTRODUCTION
Current database installations exhibit complex schema with a large number of objects. For instance, a typical installation of SAP ERP consists of tens of thousands of tables. This greatly complicates database administration and slows down development and maintenance cycles of databasebased software. Two main possible approaches have been proposed to mitigate schema complexity: ontology-based data access, and keyword search. The former approach is based on existing of a mapping from conceptual schema (i.e. ontology classes and properties) to relational model (e.g., tables and attributes). Queries are usually expressed in SPARQL query language [14] then transformed into a union of conjunctive SQL queries using rewriting techniques, based on the intensional level of ontology (i.e., TBox). The existence layer of ontology (i.e, ABox) is stored in a database as virtual ABox’s. Keyword-based search over relational database [5, 8] have been proposed, which is aimed for casual business users. The query in this scenario is not structured and is composed of a set of keywords which may refer either to objects in the schema (such as tables or attributes) or instances values. Based on the matching results, a query template is chosen and executed with input keywords, and results are presented to the user. Clearly, the latter approach is ambiguous and could not be used by software developers Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).
SSDBM ’16 July 18-20, 2016, Budapest, Hungary
in developing application logic. Database management systems and knowledge bases have evolved as separate fields. Adding reasoning to DBMS would benefits DBMS by (1) easing database administration, (2) improving query performance, and (3) enriching interaction by providing tools to explore, visualize and suggest interesting queries (for example, recommend queries based on previous queries and interesting data patterns). Improving query performance can be accomplished by reasoning to enhance the physical design, recognize equivalence and disjoint between tables, which allow to transform queries to simpler, or more efficient queries. Furthermore, by using metadata about functions, such as transitivity, inclusion, and dominance, DBMS would be able to save unnecessary computations and automatically employ optimizations such as filter-andrefine. While ontology-based data access, OBDA, approach is limited to access the data, our approach applies to wider user scenarios. On the other hand, SQL, first introduced in 1974 is still the standard query language for relational databases with wide support by commercial vendors, with many features have been added over the years. There is an apparent need in database community to revise SQL language [6] or create a new query language for relational databases (e.g. D language [12]). In this demonstration, we (1) present KDBMS, a knowledgebased DBMS a step forward towards integrating Knowledgebased and database management systems, (2) demonstrate our proposed knowledge-based query language, which introduces a higher level of abstraction to the system users. The ambiguity is resolved by using rules based on graph structure, relation between concepts and previous query history. We cover two possible usage scenarios: (1) using an existing relational model as well as query history of DDL and DML SQL queries to create ontology layer, and (2) using ontology model to build the relational data model. In both scenarios, users can improve the ontology, add metadata about ontology such as such as inclusion, equivalence and disjoint. The main contributions of our paper can be summarized as follows: • We highlight the importance of integrating knowledge and databases to handle the increasing complexity of schema and applications. • We introduce our preliminary system design and highlight its main modules.
c 2016 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-4215-5/16/07. . . $15.00 DOI: http://dx.doi.org/10.1145/2949689.2949714
• We present a novel knowledge-based query language, denoted as KQL.
2. 2.1
RELATED WORK Knowledge representation and ontology
An ontology is a formal description for a knowledge base, which consists of a finite set of concepts (classes of objects of the domain), relationships among these concepts, properties of relationships, value restrictions, and specification of logical relationships between objects. Knowledge bases consist e extensional layer (i.e., the Abox),e.g., John is a person and intensional knowledge (i.e., the TBox). All student are persons. An Ontology, O can be formally defined as the quad-tuple O = (C, R, I, A) [13], where C is the set of concepts,R is the set of binary relations, I is the set of instances, and A is the set of axioms. Web ontology Language version 2, OWL 2 for short, is the state of the art ontology language for the semantic web, and W3C recommendation [1]. Although the full language of OWL 2 is very powerful in the sense of expressiveness, its computational cost is high (even it can be undecidable). Therefore, W3C has introduced profiles of OWL with different expressiveness and computational costs, such as OWL QL, OWL DL, and OWL lite. DL is more expressive than OWL Lite and incurs a higher computational cost, yet, unlike OWL FULL, it is decidable. We use OWL DL for expressing the our proposed approach, as in [21].
2.2
Query Languages for Relational databases
A key factor of success and wide adaption of the relational model for databases is that it allows the physical and logical data independence. The application does not necessary need to know the physical representation of the relational model, and this model may change without affecting legacy applications. While SQL is by far the dominant query language for relational databases, since it been introduced within IBM System R research project in 1974 [10]. Several research efforts, dating back to 1977, have addressed allowing a casual user to interact easily with the DBMS by using visual query language (e.g., [29, 18, 20]) or XML database such as [28]. Typical database installations represent complex schemas. This complexity is mainly due to (1) large number of objects (e.g., tables, views and stored procedures ), (2) complex physical design with partitioned, normalization and denormalization, (3) long deployment history with many incremental changes to support the modified requirements., and new applications, and (4) naming conventions of database objects. Unfortunately for Software developers, fully understandings database schema and interaction among its objects greatly slow the development and maintenance cycles.
2.3
Ontology-based Data Access
Ontology-based data access uses conceptual layer which is defined in as OWL ontology or RDFS [9]. Ontology (with instances) may be materialized as one fact table (denoted as a triple store), or a relational schema based metadata from ontology [26]. The terms in the conceptual layer are mapped to the data layer using mappings which associate to each element of the conceptual layer a (possibly complex SQL) query over the data sources. The mappings have been formalized in the R2RML W3C standard [25]. Abox’s are stored as virtual tables as in Mastro [2] and ontop [7]. As noted by Christoph Pinkel [22], the costs for the manually creating a mapping constitute a significant entry barrier for applying OBDA in practice, and he proposes an incremen-
tal mapping algorithm. An interested reader may refer to [24] for a survey of mapping techniques. In our proposed approach, we support creating the relational schema from a conceptual ontology. Another obstacle for OBDA is transforming SPARQL query on the conceptual schema to SQL queries can be undecidable. The rewriting process uses the intensional layer outputs union of conjunctive queries (UCQ). OWL 2 QL captures the expressive power of UML languages; it is well suited for working with a very large number of individuals, and where it is needed to access data directly via relational queries. As shown in [17], the complexity of rewritten queries can be exponential. Fortunately, in practical cases, rewriting can be executed in polynomial time, see [16]. In our proposed approach, we find the shortest path connecting objects in queries and apply an intensional layer of ontology with data insertion by replicating values. Some research prototypes have used ontological concepts to help the user to interact with databases such as SODA [8] and ligDB [19]. Oracle includes RDF storage and query processing in Oracle 10gR2 [11], and scalable inference engine for a subset of OWL DL [27]. Our approach proposes a higher level of integration between knowledge bases and database. For example, we use the ontology constraints to estimate the cardinality of the relation in query optimization. On another hand, Ultrawrap [23, 3], creates a database views over the underling relational model. Each view represents a triple store representing a triple of subject, predicate, and object. In their approach, needs to know the size and type for each propriety and value she accesses. In our approach, we demonstrate a more generic approach allowing fuzzy matching objects and query.
2.4
Keyword Search
Recently research efforts aim to give access to business users such as SODA [8]. However, SODA[8] and [15] focus on keyword search, for the purpose of business users (i..e, not software developers) as the main system user. Our approach allows more complex queries than keyword search and simplifies DBA and developer interaction with the underlying complex relational database.
3.
KDBMS SYSTEM OVERVIEW
In this section, we highlight the main modules of our proposed system and introduce the syntax of our proposed query language, denoted as KQL. Finally, we give some examples of our proposed language, which allows DBAs and developers to interact with the DBMS on the conceptual schema. Our system optimizes and converts queries in KQL into equivalent SQL queries that would be executed by the underlying relational databases. The advantages of our approach of adding conceptual schema can be summarized as follow: (1) Higher abstraction level for complex DBMS allows DBA to optimize and change the logical database without breaking current queries. Therefore, DBAs have to create views representing the previous relational schema, not to break the existing SQL queries and stored procedures. This is a tedious and time-consuming task for DBAs. An important disadvantage of this approach is that it creates, even more, objects, further complicating the logical design of the relational database. Some DBMS imposes restrictions on insertion into views, which may break the existing code. (2) Queries in the conceptual levels are much simpler with a smaller number of tokens, in comparison
Conceptual Schema
DBA
End User
GUI
KQL Parser Logical Schema
Developer
Schema Physical Designer
Physical Schema
Knowledge-based Query Optimizer
Ontologies Domains
RDBMS
Figure 1: KDBMS System Overview
to SQL, (3) By writing queries in a more abstract way and employing knowledge about the model, in our approach, we can potentially generate more optimized plans for the queries. For instance, we can estimate the cardinality more accurate, or we can use the replication and information about shards to optimize the SQL query, and (4) By using this approach, we may use heterogeneous DBMS vendors, or even NoSQL by extending the support to include NoSQL databases. In addition, the end user can use GUI for the conceptual model to easily interact with the relational model. They can use the visual interface to navigate model and run queries in the proposed query language. Using our proposed language, DBAs can easily administrate heterogeneous distributed complex database, optimizing the physical design, replicate objects, and partition tables horizontally or vertically. DBAs may add information about the update frequency and dependency between attributes, e.g., the sum of sales and sales, this information can be useful in query optimization. Figure 1 illustrates the main modules and data structures of proposed system, which can be described as follow: Schema Physical Designer. The main objective of this module is to keep relational schema synchronized with the conceptual model. For example, when conceptual model is updated, the changes are reflected the relational model. KQL Parser. This module is responsible for parsing KQL query into SQL query. Based on similarity search, IDs in KQL are fuzzy matched to a conceptual, logical and physical schema. For example, ”student” in query may be matched with relational table ed stud. If no matching or multiple matches are found, we resolve conflicts by user-defined or system-defined rules which may be used information on previous queries or user intervention. kindly note that we do not apply TBox’s in this module. For example, assume a TBox states that any one enrolled in class is a student. Then, we need to only access students table not the enrollment information. GUI. The GUI module allows users (e.g., end-user, DBA and developers) to visually interact with our proposed system. The module supports navigating the schema over conceptual, logical and physical levels, showing the dependency between object, and mapping across levels. Also, it displays KQL and its mapped SQL queries. Knowledge-based Query Optimizer. This module is used by KQL parser to give a knowledge-based optimization using domain and application knowledge, unlike traditional database management systems. For example, if we know that graduate courses can be only registered by graduate students. The
system can transform a query joining students with graduate courses, with a join between graduate students and graduate courses. Another possible scenario, Assume DBA has created partitions and views, the query optimizer should use this information on equivalence, disjoint and partitions.
3.1
Knowledge-based Query Language
In this section, we present the basic constructs of our proposed knowledge-based query language. We designed KQL to be similar SQL query language. Kindly note that any valid SQL statement is a valid with respect to KQL. The syntax for KQL can be described as follow: KQL::=[varName "