INHERITANCE CONSTRAINTS IMPLEMENTATION IN POSTGRESQL Dragoljub Pokrajac, Hetal Patel, Marwan Rasamny, Delaware State University {dpokraja, hetalp,rasamny}@desu.edu Aleksandar Sec, EI Electromedicine,
[email protected] Abstract – In this study, we consider specialization/ generalization constraints of subclass/ superclass hierarchies in object-relational databases. After formally defining the constraints, we discuss advantages and disadvantages of current SQL standards and their PostgreSQL dialect. Subsequently, we propose a possible application of constraint using triggers and SQL functions and discuss an extension of PostgreSQL language that can provide better definition and maintenance of the specialization/ generalization constraints. 1. INTRODUCTION Relational and object-relational database management systems (DBMS) [1] are powerful and wide-spread technology, now also available in affordable, open-source packages. A typical object-relational DBMS uses a variant of SQL language [2-4] for data definition and data manipulation that support relational implementation of entity and relationship types as described by entity-relationship (ER) diagrams [1]. However, to be fully capable of implementing class/subclass relationships from extended ER diagrams (EER), a DBMS must have provisions of specifying and maintaining attribute and relationship inheritance. Unfortunately, although recent standardizations of SQL [3,4] include support for subclass/superclass relationships, they still lack capability to easily define inheritance constraints (e.g., that an object may belong to at most one child of a parent class). This presents major obstacles in mapping conceptual schemas expressed by EER diagrams into logical schemas where data are organized in relations. p1
p0
…
p m0
P
standard and in PostgreSQL, an open source DBMS. Next, we propose implementation of the constraints using triggers and SQL functions in PostgreSQL [6,7] and discuss its application in other object/relational DBMS. Finally, we propose an extension of PostgreSQL INHERITS clause that can accomplish definition of generalization/specialization constraints on a consistent and straightforward manner. 2. INHERITANCE CONSTRAINTS Inheritance constraints naturally arise from superclass/subclass hierarchies in semantic data modeling [1]. In this model, each object may belong to one class as well as to its subclasses (children) or a superclass (parent). Specifically, the member of a subclass is also a member of all its superclasses. We define hierarchy as semantic dependence of classes where one class figures as parent and one or several as children. In this paradigm, multiple inheritance is modeled such that one hierarchy is associated with each superclass of the observed class. Inheritance constraints are divided into disjointness and completeness constraint. [1]. Disjointness constraint specifies whether a specific object may be member of several children of the same parent. Thus, hierarchy may be disjoint if each object is allowed to belong to at most one child, or overlapping (if an object may belong to multiple children). Completeness constraint specifies whether a object that is member of a parent must also belong to a child. Hence, a hierarchy may be total (when each object from a parent class must belong to at least one child) or partial (when an object from parent is allowed not to belong to any children). Consider the hierarchy H(P, C1, C2,…,Cn), with the parent P and children C1, C2,…,Cn, Fig 1. Without loss of generality, we assume that the parent has non-key attributes p1 , p 2 , K , p m and a single key attribute – p0. A child Ci,i=1,…,n in addition to attributes inherited from the parent, also has attributes ci ,1 , c i , 2 ,K , c i ,m . Within H, each object t is uniquely identified by value of p0 and may belong to one or multiple classes. 0
C1 c1,1
C2 …
c1,m1
c 2,1
…
c 2 ,m 2
i
Cn
…
c n ,1
…
c n , mn
Fig.1. EER diagram of a class hierarchy H(P, C1, C2,...Cn) with class P as the parent with children C1, C2,...Cn. Inheritance constraints (also known as specialization/ generalization constraints) can be enforced on application level, e.g., through JDBC [5]. A major disadvantage of this approach is that does not clearly separate external views from conceptual schema. Hence in this paper, we discuss the alternative approach, to implement the constraints through data definition language (DDL). We first define inheritance constraints and specific rules to be maintained for different types of subclass/superclass inheritance. Subsequently, we discuss the provisions for the constraints in the current SQL
Due to the key constraints, all tuples t in each class are different [1]. Hence, the following equation holds: (∀T∈{P, C1, C2…, Cn})(¬∃ti, tj∈T) ti= tj
(1)
Due to existence of generalization/specialization hierarchy, the following must hold (∃ Ci i∈1,…,n) t∈ Ci ⇒ t∈P
(2)
For hierarchies with partial completeness and overlapping disjointness, the inheritance constraints are fully described by (2). If the completeness is total, in addition to (2) the converse statement also holds: t∈P⇒(∃ Ci i∈1,…,n) t∈ Ci
(3)
Finally if hierarchy is disjoint, then there exists at most one child where an object t belongs to, hence the following is also true t∈P⇒(∃* Ci i∈1,…,n) t∈ Ci
(4)
Note that if the hierarchy has total participation and disjoint specialization, eqs. (3) and (4) combine into: t∈P⇒(∃1Ci, i∈1,…,n) t∈ Ci
(5)
Generalization/specialization constraints expressed by eqs. (1)-(5) should be maintained during insert, delete or update operations on the objects belonging to the classes from the hierarchy H. Formal rules for maintaining the constraints are shown in Table 1. 3. INHERITANCE AND SQL The SQL language is one of the major reasons for success of relational database technology. SQL:1999 [3] is the standardization of the SQL, aimed to support object-oriented data management into essentially relational language. SQL:1999 introduces structured data types (SDT) which are analog to classes in object-oriented languages. A SDT may be non-instantiable (equivalent to C++ abstract classes) in which case no object may belong directly to the SDT. A SDT may inherit attributes from another SDT, but multiple inheritance is not allowed. A set of all types that inherit (directly or indirectly) from a single type comprise a type hierarchy. In SQL:1999, generalization and specialization are supported through the hierarchy of database tables (so-called typed tables). The table hierarchy is created as a subtree of the type hierarchy. Rows of the typed tables are referenced through special REF types that may refer to a specified column of the table or to a system generated key. Other new features of SQL:1999 relevant to generalization and specialization include [2] triggers, improved view updatability and functions. The standardization of SQL is still work in progress. However, the latest revision of SQL, SQL:2003 [4] did not bring significant improvements in comparison to SQL:1999 with regards to specialization hierarchies. Inheritance mechanism of current SQL standard is redundant (e.g., REF types) and incomplete. For instance, it does not provide multiple inheritance, and does not support hierarchies of different types with the same parent (e.g. H1(P,C1,...,CN) and H2(P,C1',...,CN') where H1 is disjunctive total and H2 is partial). Also, non-instantiable SDTs can automatically support only overlapping total hierarchies. Also, due to difficulties in implementation, no known DBMS confirms to SQL:1999 yet [3]. 4.POSTGRESQL CONSTRAINTS
AND
INHERITANCE
PostgreSQL [6] is an open-source DBMS which originated from Postgres and Ingres projects. It is one of the most advanced DBMS available. PostgreSQL is welldocumented, client-server oriented and ACID compliant. It supports most of SQL:1992 and many of SQL:1999 features. Unlike MySQL, PostgreSQL implements (with minor differences) the SQL:1999 specification for triggers and extensively supports stored procedures and functions. Current version of Postgress (v.7.4.2) [7] does not support
automatic view update, but the view update is essentially user-defined by the rule system (an extension of SQL language not existent in current SQL standard). Due to aforementioned features, PostgreSQL is an appealing alternative for database applications, especially for designs involving generalization/specialization hierarchies. Currently, PostgreSQL has support for SDT, but such data types cannot inherit from others. Hence, PostgreSQL does not use the mechanism of SDTs to accomplish specialization/ generalization hierarchies. Instead, PostgreSQL introduces another language extension: an INHERITS clause to specify attribute inheritance on table level. For a simple hierarchy H(P,C1,C2) current PostgreSQL application of INHERITS is illustrated below where a parent table P and its children C1 and C2 are declared as: CREATE TABLE P (P0 INTEGER PRIMARY KEY); CREATE TABLE C1 (C11 INTEGER) INHERITS (P); CREATE TABLE C2 (C21 INTEGER) INHERITS (P);
One of the advantages of this approach is that the hierarchy is not specified twice (on both type and table level, as in SQL:1999 standard). Also, unlike SDTs in SQL:1999, INHERITS in PostgreSQL supports multiple inheritance. However, the INHERITS clause as currently implemented have several major drawbacks which actually prevents its widespread usage for specifying hierarchies. Due to implementation and performance issues, INHERITS in PostgreSQL is emulated with views. For instance, when selecting from P, the result is equivalent to selecting from the following view (defined in SQL:1999): CREATE VIEW P_VIEW AS (SELECT * P) UNION ALL CORRESPONDING BY *.P (SELECT * FROM C1) UNION ALL CORRESPONDING BY *.P (SELECT * FROM C2);
Inherited tables inherit only column characteristics, but not constraints, triggers, indices etc. Since primary key constraints are not inherited, the insertion of tuples with the same value of the primary key into parent and a child can lead to violation of key constraints (multiple tuples with the same “primary key” value as a result of SELECT * from P;). Since foreign key constraints only apply to parent tables (not to their inheriting children!), insertion of tuples into another table that refers to the parent will not be allowed, if the referred tuples physically exist only in a child table. Since user-defined triggers are not inherited, a trigger defined on parent will not be activated if the operation (insertion/deletion/modification) is performed on child, unless the triggers are replicated on each child. Due to aforemmentioned problems with INHERITS, to enforce generalization/ specialization constraints PostgreSQL, it is therefore either necessary to apply other mechanisms that already exist in the SQL (e.g., triggers, views, functions) or to propose modification of INHERITS clause. In the rest of the paper, we will discuss both propositions. 5. INHERITANCE IMPLEMENTATION WITH TRIGGERS AND SQL FUNCTIONS In this section, we illustrate the implementation of inheritance constraints using triggers, and SQL stored procedures. Although the examples are given in PostgreSQL dialect, they can be easily adapted for other DBMS with similar functionality. Due to lack of space, here we discuss
only the implementation of a hierarchy with overlapping total specialization. Also, we will not discuss simultaneous insertion in parents and children, which can be accomplished using views. An interested reader is referred to our web-page http://gibran.desu.edu/hbcu-up/ (select TOOLS) for more details about these issues. Consider hierarchy H(P,C1,C2) where P contains a primary key p0 and non-key attribute p1. In addition to the primary key, the class C1 has attributes c11 and c12, while C2 has attributes c21 and c22. In our design, the classes are represented by tables p_o_t, c1_o_t and c2_o_t that are described by the following DDL statements: CREATE TABLE p_o_t (p0 INTEGER, p1 INTEGER, p_counter INTEGER, PRIMARY KEY(p0)); CREATE TABLE c1_o_t(p0 INTEGER, c11 INTEGER, c12 INTEGER, PRIMARY KEY(p0),FOREIGN KEY(p0) REFERENCES p_o_t(p0) ON UPDATE CASCADE ON DELETE CASCADE); CREATE TABLE c2_o_t(p0 INTEGER, c21 INTEGER, c22 INTEGER, PRIMARY KEY(p0),FOREIGN KEY(p0) REFERENCES p_o_t(p0) ON UPDATE CASCADE ON DELETE CASCADE);
To accomplish insertion rules from Table 1, we can create function CREATE OR REPLACE FUNCTION INSERT_c() RETURNS TRIGGER AS ' DECLARE p0_temp INTEGER; BEGIN SELECT INTO p0_temp p0 FROM p_o_t WHERE p_o_t.p0=NEW.p0; IF FOUND THEN UPDATE p_o_t SET p_counter=p_counter+1 WHERE p_o_t.p0=p0_temp; ELSE INSERT INTO p_o_t (p0,p_counter) VALUES (NEW.p0,1); END IF; RETURN NEW; END; ' LANGUAGE 'plpgsql';
and for each child create trigger calling the function, as shown below for the table c1_o_t: CREATE TRIGGER INSERT_c1 BEFORE INSERT ON c1_o_t FOR EACH ROW EXECUTE PROCEDURE INSERT_c();
As we can see, the insertion of a child leads to insertion of corresponding tuple in a parent table, if such a tuple has not already existed there. In PostgreSQL, we prevent the insertion of data solely into parent table but not executing INSERT INTO p_o_t; outside of the INSERT_c function. In a DBMS that supports SQL:1999 standard, we could simply define type SDT of p_o_t table as non instantiable. The role of additional attribute p_counter is to keep track of the number of children having a particular primary key value. As we will see, this is important for deletion, when a parent should be deleted only if there are no more children sharing the corresponding primary key value. Due to specified referential triggered action in the definition of the parent table (ON DELETE CASCADE), the deletion of a parent tuple leads automatically to the removal of related children tuples. To support inheritance constraints for deletion of children, we create the following function CREATE OR REPLACE FUNCTION delete_c() RETURNS TRIGGER AS ' DECLARE p0_temp INTEGER; BEGIN SELECT INTO p0_temp p0 FROM p_o_t WHERE p_o_t.p0=OLD.p0 AND p_o_t.p_counter=1; IF FOUND THEN delete from p_o_t WHERE p0=p0_temp; ELSE UPDATE p_o_t SET p_counter=p_counter-1 WHERE p_o_t.p0=OLD.p0; END IF; RETURN OLD; END;
' LANGUAGE 'plpgsql';
and for each child implement a trigger, such as: CREATE TRIGGER delete_c1 AFTER DELETE ON c1_o_t FOR EACH ROW EXECUTE PROCEDURE delete_c();
Here, the child tuple deletion leads to decrease of counter, and if the counter drops to zero, a corresponding parent tuple is also deleted. The deletion trigger is activated after deletion in child, to prevent problems with referential integrity which would occur otherwise. When a parent’s primary key is updated, due to referential integrity, the related primary key values at children are automatically updated. To accomplish primary key update when it is updated in a child, we write the following function CREATE OR REPLACE FUNCTION update_c() RETURNS TRIGGER AS ' DECLARE p0_temp INTEGER; BEGIN IF (NEW.p0 = OLD.p0) THEN RAISE NOTICE ''Nothing to update''; ELSE UPDATE p_o_t SET p0=NEW.p0 WHERE p_o_t.p0=OLD.p0; --Precondition: tuple in parent existed! We do not need to check it! END IF; RETURN NEW; END; ' LANGUAGE 'plpgsql';
and for each child create a trigger analog to CREATE TRIGGER update_c1 BEFORE UPDATE ON c1_o_t FOR EACH ROW EXECUTE PROCEDURE update_c();
6.PROPOSED LANGUAGE
MODIFICATIONS
OF
SQL
The implementation of inheritance constraints as proposed in Section 5 is feasible in a variety of commercial DBMS that supports triggers, functions and views including DBMS that implement table hierarchies based on inherited SDTs. However, the proposed design is complex and prone to programming errors (e.g., it requires creation of triggers for each child). Another problem is that the physical organization of data (how class attributes are distributed into different tables) is implicitly assumed and cannot be easily changed without redesigning triggers and functions. To alleviate these issues, we propose to modify SQL language by creation of a specialization constraint that corresponds to a hierarchy as defined in Section 2. To create a new specialization, we propose the following statement (default values are underlined): CREATE SPECIALIZATION specialization_name ON parent_table [WITH [disjointness_constraint] [completeness_constraint]] [BY table_implementation]; disjointness_constraint:= OVERLAPPING| completeness_constraint:= PARTIAL|TOTAL
table_implementation:= TABLE| REDUCED TABLE
SINGLE
TABLE|
DISJOINT
REPLICATED
The proposed CREATE SPECIALIZATION in a practical realization should be accompanied by corresponding ALTER SPECIALIZATION and DROP SPECIALIZATION commands to accomplish modification and deletion of a hierarchy. Using the proposed specialization constraint, for each hierarchy we can specify disjointness and completeness constraints. In addition to this specialization type, we can define how subclass/superclass relationship maps into tables. Hence, we use SINGLE TABLE to specify that corresponds to placing all the parent and children attributes into one table
actual implementation of the proposed SQL extensions in PostgreSQL through automatically generated views.
[1]. Similarly, REPLICATED TABLE specifies that parent attributes also exist in children tables (similar as current implementation of INHERITS in PostgreSQL) while the default REDUCED TABLE splits attributes similar as proposed in the Section 5.
8. ACKNOWLEDGEMENTS This work has been partially supported by NSF-funded “Seeds of Success: A Comprehensive Program for the Retention, Quality Training, and Advancement of STEM Student” Grant (award # 0310163). Additional support was provided by NSF-funded Infrastructure Grant (award # 0320991), NIH-funded Delaware BRIN Grant (P20 RR16472) and DoD HBCU/MI Infrastructure Support Program (45395-MA-ISP Department of Army).
When creating a table that inherits from another table, we propose the specification of a hierarchy through which the inheritance occurs. To this end, we propose to modify existing INHERITS clause of the CREATE TABLE statement as defined in PostgreSQL as follows: [ INHERITS ( inheritable [, ... ] ) ] inheritable:= parent_table specialization_name;
USING
SPECIALIZATION
REFERENCES
The proposed design has several advantages. Once a specialization is defined, the creation of new tables that inherits using the specialization is very easy. In addition, multiple hierarchies (that share the same parent, see Section 2) are easily defined and supported. Finally, multiple inheritance (one child inheriting from multiple parents) is also feasible (if INHERITS retains its property currently maintained in PostgreSQL to support multiple inheritance). 7. CONCLUSIONS In this study, we considered inheritance constraints for subclass/superclass relationships in object-relational databases. After specifying the theoretical rules that should be maintained for different types of inheritance, we provided practical implementation of the inheritance constraints in a DBMS that supports triggers and SQL functions. Specific examples are provided in PostgreSQL, an open source object-relational DBMS. To support easy definition and maintenance of the inheritance constraints (with no need to explicitly specify triggers and stored procedures) we proposed an extension of SQL standard, based on PostgreSQL INHERITS clause. Work in progress includes
Intended action
Insert
Insert t into parent P Insert t into child Ci
Insert t into P
A. Kriegel and B. M. Trukhnov, SQL Indianapolis: Wiley Publishing, 2003.
[3]
J. Melton, Advanced SQL 1999, Amsterdam: Morgan Kaufmann, 2003.
[4]
A. Eisenberg, J. Melton, K. Kulkarni, J.E. Michels and F. Zemke, “SQL:2003 has been published,” SIGMOD Record, vol. 33, pp. 119-126, March 2004.
[5]
G. Reese, Database Programming with JDBC and Java, Sebastopol: O'Reilly & Associates, 2000.
[6]
K. Douglas and S. Douglas, PostgreSQL- A Comprehensive Guide to Building, Programming and Administering PostgreSQL Databases, Indianapolis: Sams Publishing, 2003.
[7]
PostgreSQL version 7.4.2, interactive documentation, http://www.postgresql.org/docs/7.4/interactive/index.ht ml, accessed March 11, 2004.
Partial Insert t into P
If t ∈ P, not allowed Else, Insert t into Ci, insert t into P
Not allowed
Delete t from parent P
Delete t from children Ci,1, Ci,2,…, Ci,k
Update primary key of t in parent P Update primary key of t in child Ci Update non key attribute of t in parent P Update non key attribute of t in child Ci
Bible,
Overlapping Total
Not allowed
If t ∉P, Insert t into Ci Insert t into P Else: If (∃Cj) j≠i,t∈ Cj, not allowed. Else, Insert t into Ci
Delete t from child Ci
Update
[2]
Partial
Insert t into children Ci,1, Ci,2,…, Ci,k Delete
R. Elmasri and S.B. Navathe, Fundamentals of Database Systems, 4th ed., Boston: Pearson Addison Wesley, 2003.
Disjoint
Disjointness Constraint Completeness Constraint Operation
[1]
Total Not allowed
Insert t into Ci If t ∉P, insert t into P
Insert t into Ci,1, Ci,2,…, Ci,k If t ∉P, insert t into P
Delete t from P Delete t from any child Ci where t may exist Delete t from Ci Delete t from Ci Delete t from Ci Delete t from Ci Delete t from P If (¬∃Cj) j≠i,t∈ Cj, Delete t from P Delete t from Ci,1, Ci,2,…, Ci,k Not possible Delete t from If (¬∃Cj j≠i1,…,ik) t∈ Cj, Ci,1, Ci,2,…, Ci,k Delete t from P Update primary key of t in any child Ci where t may exist Update primary key of t in parent P. Update primary key of t in any child Cj where t may exist Update t in P Update t in Ci
Table 1. Rules for different types of inheritance constraints