Document not found! Please try again

Normalization of Single Level Nested Structure in

0 downloads 0 Views 39KB Size Report
dependencies in relations and propose the .... ROW attribute is shown by the following ... B. C. Figure 2. Functional Dependency. BC. A . FD rules and their ...
Normalization of Single Level Nested Structure in Object-Relational Data Model Eric Pardede, J. Wenny Rahayu {ekparded, wenny}@cs.latrobe.edu.au Department of Computer Science and Computer Engineering La Trobe University Bundoora VIC 3083, Australia

Abstract Object-Relational Data Model (ORM) has included various Object-Oriented (OO) conceptual implementations in a relational environment. One of the advances is the inclusion of ROW type feature to facilitate nested relation structure. In this paper, we propose a new normalization technique for relations with single nested structure using ROW type attribute. In our best knowledge, there is no work on the normalization of this attribute type. Yet, in database conceptual model we encounter many cases that can get benefit from single ROW type implementation. Our proposed normalization rules are based on the original purpose of removing anomalies in the database and use the theory of functional dependencies. The new normalization rules consider different scenarios that can create undesired dependencies in relations and propose the way to redesign the relations in order to avoid the database anomalies. Keywords: Object-Relational Data Model, ROW Type, Normalization, 1. Introduction Relational Model is a widely-used data model in commercial world. Since its introduction in 1970s many efforts have been done to increase its modeling capability including the capability to have non-atomic value or nested attribute. Many claim that the structure has violated the mature and well-known normalization rules [4, 5].

David Taniar [email protected] School of Business Systems Monash University Clayton VIC 3800, Australia

Despite the claim, some Relational Database Management Systems (RDBMS) vendors have implemented nested attribute type for their products [7, 11]. Since then, the attribute type has been standardized as ROW data type in SQL 1999 [6] and the forthcoming SQL4 [9, 12]. Figure 1 shows the implementation of the OO conceptual model into ORM table with ROW attribute. There are many cases where we get benefit from ROW implementation. For example, to store the non-atomic values in single attribute, such as address (street, city, postcode, country), name (first, middle, surname), etc. Another example is to store values of table relationships so that the join operations can be reduced or eliminated. Class C1 A B 1/*

TABLE C1 Att A Val A

Att B Val B

Row C2 Att M Att N Val M Val N

1 Class C2 M N

General Syntax CREATE TABLE (atti data type, ...., atti+n ROW (att(i+n)j data type, ....));

Figure 1. ROW Type Implementation During the database design process, we might create some tables with dependency types that are vulnerable to database anomalies problem [4, 5]. This is the motivation of creating normal forms in relational model [2]. Obviously, this problem can also happen in ORM table with ROW

attribute. In this paper we propose normalization rules for ORM tables with ROW attribute. We adopt the functional dependencies theory [1] to develop the rules. The new rules consider different undesired dependencies in the tables and propose the way to redesign the tables. Thus, we also have to classify the undesired dependencies created by ROW attribute. We emphasize this work on the rules for single ROW attribute. We are aware that a table can also have a collection of ROW. The normalization issue for the latter has to concern on the repeating values inside the collection, as well as the treatment for different collection type [12]. At the time of writing, the work on collection is being conducted. After the introduction the rest of the paper is structured as follows. Section 2 shows background on nested or ROW attribute. In section 3 we classify the desired and undesired dependencies inside ORM table with ROW attribute. In section 4 we propose the new normalization rules and finally we conclude the paper in section 5. 2. Nested Attribute: An Overview Far before ORM, database communities have recognized nested structure attribute in a flat relation. Network Data Model is legacy model that stores data in a record type. A record type can have one or more data elements as part of its definitions [10], each of which can have non-atomic value. The data element is actually a nested structure inside a record type. Another data model, Nested Relational Model, was developed to break the basic constraint of Relational Model, in which the tuple element can only have atomic values [8]. This model had offered powerful extension to its origin. However, traditional Relational Model still dominates the database community. Even until present there is no commercial DBMS that has chosen to implement Nested Relational Model even in its original form [5].

Finally, ORM was introduced in 1990’s and has been widely accepted mainly because it is developed on top of ever- popular Relational Model. This data model has also enabled the nested structure attribute inside a flat relation. ORM is supported by Structured Query Language (SQL), which is known as the standard language for Relational Model design and manipulation. In SQL 1999 and current SQL4, the nested structure has become one of the new constructed data type which is called ROW Type [6, 12]. SQL4 provides named and unnamed ROW type. While the named ROW attribute is identified by its name, the unnamed is identified by its structure. Both are constructed by elements, which are called fields. They are a bit similar to columns in table, except they do not take constraints. ROW attribute is shown by the following example, in attribute address. SQL4 Unnamed ROW Type CREATE TABLE Author (surname CHARACTER VARYING(20), initial CHARACTER VARYING(2), contact NUMBER, address ROW (street CHARACTER VARYING(50), city CHARACTER VARYING(30), state CHARACTER VARYING(5))); SQL4 Named ROW Type CREATE ROW TYPE Address_Type (street CHARACTER VARYING(50), city CHARACTER VARYING(30), state CHARACTER VARYING(5)); CREATE TABLE Author (surname CHARACTER VARYING(20), initial CHARACTER VARYING(2), contact NUMBER, address Address_Type);

3. Functional Dependencies in ORM Table with ROW Attributes Normalization process in Relational Model is typically started by synthesizing a giant relation schema called the universal relation. Using functional dependencies specified by the designers, the universal schema is

decomposed until it is no longer desirable. The process is stopped when the tables can avoid database anomalies such as insertion, deletion, and update anomalies [4, 5]. Specifying functional dependencies is an important role in normalization process. Before we can develop the rules, we need to classify the dependencies that can create database anomalies, which we call undesired dependencies. With additional ROW attribute in ORM table, there are new dependencies scenarios that have not captured in traditional relational model. 3.1. Desired Dependency in ORM Table Functional Dependency (FD) is the constraint between two sets of attributes from a table/relation. An attribute Y of a relation R is functionally dependent on the attribute X of R, if and only if each X-value in R always associate with precisely one Y-value in R [1]. It is denoted with symbol X → Y. The left hand side of the symbol is called the determinant. Using diagrammatic notation (see Fig.2) the horizontal line represents the FD, while the vertical lines represent the subset of the attributes. The vertical line without the arrow is the left hand side attributes, and the ones with the arrows are the right hand side attributes. The arrow illustrates the direction of the FD.

A

B

C

Figure 2. Functional Dependency A → BC FD rules and their inference rules [1, 5], are used to determine the semantic of relation schemas including the Candidate Keys of the relations. Candidate keys are a set of attributes that functionally determine every attributes in the relation and no subset must have that property [3]. In Fig 2 the key is shown by the underline. By having these information we can develop the relation

normal forms. Interested readers should find the theory on normalization and normal forms from many database textbooks [4, 5]. 3.2. Undesired Dependency in ORM Table The normalization is performed by removing undesired dependencies in the relation. Undesired dependencies in relation R that has a set of FD F are all FDs that are not covered by and cannot be inferred from any of the subset of F. There are three types of undesired dependencies that can be occurred in a relation with single ROW attribute and they are partial dependency, transitive dependency, and non trivial functional dependency (see Fig.3). ROW

A

B

C PD

(D

E

F)

TD TD

NTFD

Figure 3. Undesired Dependencies in ORM Table with ROW Partial Dependency (PD) exists if in FD X → Y, there are attributes Z with Z ∈ X, which can be removed without affecting the dependency. In other words, for some Z ∈ X, (X-{Z}) → Y. In Fig 3, there is a PD from B to D because the latter only depends on a part of the composite keys, in this case only to A. Transitive Dependency (TD) exists if in the relation R where FD X → Y exists, there are attributes Z that is not a subset of keys in R, and has dependency Z → Y. In other words, TD is the dependency between two or more non keys attributes. For example in Fig. 3, there is a TD from D to E because the latter depends on the former, which is another non key attribute. The same case for dependency from C to F. Non-Trivial Functional Dependency (NTFD) exists if in the relation R where FD X → Y exists, Y can be a prime attribute of R. In other words, this dependency exists if an

attributes that is not considered as key, actually determines other attributes and thus, should be key attribute. For example in Fig. 3, there is a NTFD from F to A because the latter, which is partial key, depends on the former that is a non key attribute. Having identified the dependencies in the table/relation, we can now propose the normalization rules to remove these undesired dependencies in the next section. 4. Proposed Normalization Rules With the existence of nested structure in ORM, we find that the traditional normalization rules for Relational Model [2] are not sufficient to cover all possible dependencies in a relation. Section 3 has shown the undesired dependency in ORM relation that can lead to database anomalies. In this section we propose the normalization rules for ORM relation with single ROW attribute. 4.1. Normalization Rules for Removing Partial Dependency PD exists if in a relation there is non-key attribute(s) depended only on part(s) of the keys. We can add an additional rule to remove anomalies caused by PD. For illustration we use a simple relation with attributes, A, B, C, and ROW (D, E, and F). Rule 1: Removing PD between partial candidate key and ROW attribute(s). For X(x1, x2,…xk) → Y(y1, y2,…yl), if ( ∃ X’ ⊆ X) ( ∃ Y’ ⊆ Y) ∋ (X’ → Y’), create the relations R1(X, Y-Y’) ∧ R2 (X’, Y’). Example: The relation R1 below has a PD because the ROW attributes are not fully dependent on the entire candidate keys. In this case, D is only dependent on B and not to the composite of {A, B}. Therefore, a new relation, R2, is created.

(CDEF), but there is a PD (B) → (D) Relations with PD removed: Relation R1 (A, B, C (E,F)); Relation R2 (B, D) The FDs are (AB) → (CEF) ∧ (B) → (D). 4.2. Normalization Rules for Removing Transitive Dependency TD exists if in a relation there is non-key attribute(s) depended on the other non-key attribute(s). We can add two additional rules to remove anomalies caused by TD. For illustration we use the same relation with attributes, A, B, C, and ROW (D, E, and F). Rule 2a: Removing TD among ROW attributes. For X(x1, x2,…xk) → Y(y1, y2,…yl) if ( ∃ Y’ ⊆ Y)( ∃ Y’’ ⊆ Y) ∋ (Y’ → Y’’), create the relations R1(X, Y-Y’’) ∧ R2(Y’, Y’’) Example: The relation R1 below has a TD since there is a ROW attribute E, that is dependent to another ROW attribute F. In this case we create a new relation, R2, using the determinant attribute as the candidate key. Relation with anomalies in TD: Relation R1 (A, B, C, (D, E, F)) The desired FDs for this relation is (AB) → (CDEF), but there is a TD (E) → (F). Relations with TD removed: Relation R1 (A, B, C (D, E)) Relation R2 (E, F) The FDs are (AB) → (CDE) ∧ (E) → (F). This rule applied for TD among ROW attributes. TD can also appear between a non-key attribute(s) (outside the ROW) with the ROW attribute. The rule is shown below.

Relation with anomalies in PD: Relation R1 (A, B, C, (D, E, F)) The desired FD for this relation is (AB) →

Rule 2b: Removing TD between non key attribute and ROW attributes. For X(x1, x2,…xk) → Y(y1, y2,…yl) ∧ X(x1, x2,…xk) →

Z(z1, z2,…zm) ∧ Y ∩ Z= Ο , if ( ∃ Y’ ⊆ Y) ( ∃ Z’ ⊆ Z) ∋ (Y’ → Z’), create the relations R1(X, Y, Z-Z’) ∧ R2(Y’, Z’). Example: The relation R1 below has a TD since there is a ROW attribute F that is depended on non-key attribute C. Following the rule, a new relation, R2, is created.

R2, is created. Relation with anomalies in NTFD: Relation R1 (A, B, C, (D, E, F)) The desired FD for this relation is (AB) → (CDEF), but there is an NTFD (F) → (A). Relations with NTFD removed:

Relation with anomalies in TD: Relation R1 (A, B, C, (D, E, F)) The desired FD for this relation is (AB) → (CDEF), but there is a TD (C) → (F). Relations with TD removed: Relation R1 (A, B, C, (D, E)) Relation R2 (C, F) The FDs are (AB) → (CDE) ∧ (C) → (F). It can be seen that even though the TDs handled by Rule 2a and Rule 2b are exist between different attribute type, the treatment is actually very similar. We come up with an addition relation that has the determinant attribute as a candidate key. 4.3. Normalization Rules for Removing Non-Trivial Functional Dependency The FD X → Y will be non-trivial if it does not satisfy the requirement Y ⊆ X. In other word, there must not be any partial candidate key that is dependent on non-key attributes. We can add an additional rule incorporating this type of dependency to remove the potential anomalies. For illustration we use the same simple relation with attributes, A, B, C, and ROW (D, E, and F). Rule 3: Removing NTFD between the ROW attribute(s) and the candidate key(s) .For X(x1, x2,…xk) → Y(y1, y2,…yl), if ( ∃ X’ ⊂ X)( ∃ Y’ ⊂ Y) ∋ (Y’ → X’), create the relations R1(X, Y-Y’) ∧ R2(Y’, X’). Example: The relation R1 below has an NTFD since candidate key A is dependent on the ROW attribute F. Thus, a new relation,

Relation R1 (A, B, C, (D,E)) Relation R2 (F, A) The FDs are (AB) → (CDE) and (F) → (A). The contribution of our proposed rules is to remove the potential database anomalies created by the undesired dependencies inside relations that have single ROW attribute. We emphasize that more rules should be derived if we consider the collection of ROW attributes. Not only we have to consider the anomalies derived by repeating groups, but also consider different collection type (set, list, bag). At the time of writing, work on these rules is still being conducted. 5. Conclusion and Future Works In this paper we propose new normalization rules for ORM relation with single ROW attribute. Our work is motivated by the fact that the existing normalization rules have not considered nested structure attribute inside a relation. The proposed normalization rules use the basic theory of functional dependencies. The rules have considered different scenarios that can create undesired dependencies and propose the way to redesign the relations/tables in order to avoid anomalies during database manipulation. Since this paper only covers the normalization rules for single ROW attribute, in our next work we are going to propose the full normalization rules for database with collection ROW attribute. For future work, we can also do research on other database design issues that are affected by ROW attribute such as integrity constraint, query plan and tuning, etc.

References [1]. Armstrong, W.W.: “Dependency structures of data base relationships”. Proc. of Int’l Federation for Information Processing Congress (IFIP), North-Holland, 1974, pp.580-583 [2]. Codd, E.F.: “Further normalization of the database relational model”. Data Base Systems, Prentice Hall, 1972, pp. 33-64 [3]. Codd, E.F.: “Extending the Database Relational Model to Capture More Meaning”. ACM Transactions of Database System (TODS) 4(4), 1979, pp.397-434 [4]. Date, C.: An Introduction to Database Systems. Addison-Wesley, 1990 [5]. Elmasri, R. and Navathe, S.B.: Fundamentals of Database Systems. Addison Wesley, 2000 [6]. Fortier, P.: SQL3 Implementing the SQL Foundation Standard, McGraw Hill, 1999

[7]. Informix.: http://www-3.ibm.com/software/data/inf ormix/, 2003 [8]. Makinouchi, A.: “A Consideration on Normal Form of Not-NecessarilyNormalized Relation”. Proc. Int’l Conf on Very Large Data bases (VLDB), IEEE-CS and ACM, 1977, pp. 447-453 [9]. Melton, J. (ed.),: Database Language SQL – Part 2 Foundation. ISO-ANSI WD 9072-2, International Organization for Standardization, Working Group WG3, 2002 [10]. Mcfadden, F.R. and Hoffer, J.A.: Modern database management, Benjamin Cummings, 1994 [11]. Oracle.: http://www.oracle.com, 2003 [12]. Pardede, E., Rahayu, J.W. and Taniar, D.: “New SQL Standard for Object-Relational Database Applications”. Proc. IEEE Conf on Standardization and Innovation of Information Technology (SIIT), 2003, TU Delft, pp.191-203

Suggest Documents