A Graphical, Functional-Dependency Preserving ... - CiteSeerX

A Graphical, Functional-Dependency Preserving Normalization Algorithm for Relational Databases Jonathan P. Bernick* Department of Computer Science Coastal Carolina University Conway, SC 29528

1. Abstract The normalization of relational databases is a topic of ongoing interest. We present a graphical normalization algorithm for relational databases that is lossless, functional-dependency preserving, and able to normalize relations with multiple candidate keys. Applications of this algorithm and future research directions are discussed.

2. Introduction The normalization of relational databases, and the teaching thereof to university students, are topics of ongoing interest to the information technology community (Mitrovic, 2001; Sanders and Shin, 2002). In this paper we present a graphical normalization algorithm for relational databases that has the twin advantages of being able to handle relations with multiple candidate keys and being easy to teach to students. Consider the case of a database designer normalizing a billing database for a small business. The database currently has one relation, R, with schema R = R(Order_id, Purchaser_id, Name, Address, City, State, Zip, Total_cost, Bill_paid), which is under an irreducible set of functional dependencies S’ such that S’ = {O→PTB, P→NAZ, Z→CS}, where each attribute is represented by its first letter. As defined, R is in second normal form (2NF), and thus contains considerable redundancy. Accordingly, it is desirable to decompose R into multiple relations in third normal form (3NF); however, prior to doing so, let’s draw a sketch of the functional dependencies of S’, using the left-side of one functional dependency as the right-hand side of another functional dependency whenever possible. This results in Figure 1.

Figure 1. *

E-mail: [email protected]

Figure 2.

With a standard normalization algorithm, we would create a lossless, functional-dependency preserving decomposition of R by creating one new relation for each unique left-hand side in S (Elmasri and Navathe, 2000), resulting in three new relations R1(Order_id, Purchaser_id, Total_cost, Bill_paid), R2(Purchaser_id, Name, Address, Zip), R3(City, State, Zip). Note, though, that we accomplish the same thing by circling each attribute in Figure 1 that nontransitively determines one or more attributes together with the attributes thus nontransitively determined (Figure 2), and making the contents of each circle into a relation. Now let us consider a slightly more complex normalization. A large manufacturing concern divides the territories in which it operates into geographical regions, each of which contains exactly one factory, each of which is managed by exactly one manager who is responsible for all company activities in that region. The unnormalized relation R which stores information about the managers has the schema R = R(Manager, Factory, Region, Title, Years_of_service, Salary, Benefits) under an irreducible set of functional dependencies S’ = {M→FTY, F→R, R→M, TY→SB}. We find ourselves in a bit of a quandary; the relation is in 2NF and requires normalization, but in addition to primary key Manager R has alternate keys Factory and Region, and thus no decomposition algorithm that is lossless and guaranteed to be functional-dependency preserving exists to decompose the relation into Boyce-Codd Normal Form (BCNF) (Date, 2004). However, observe what happens when we draw a diagram of the functional dependencies of S in the manner of the previous example (Figure 3):

Figure 3.

Figure 4.

Figure 3 clearly illustrates that the functional dependencies M→F, F→R, and R→M form a loop, indicating that attributes R, F, and M have a 1-1 relationship, and accordingly must all be present in the same relation to avoid increasing redundancy, as should any attribute sets nontransitively dependent on R, F, and M. Accordingly, if we first circle the R-F-M loop together with all those attribute sets nontransitively dependent on R, F, or M, and thereafter circle any attribute sets not part of the R-F-M loop that nontransitively determine one or more attributes together with the attributes thus nontransitively determined (Figure 4), and make the contents of each circle into a relation, we may create a BCNF / 3NF decomposition R1(Manager, Factory, Region, Title, Years_of_service), R2(Title, Years_of_service, Salary, Benefits),

that is both lossless and functional-dependency preserving. This remainder of this paper will be devoted to formalizing and discussing the normalization method used in the above examples.

3. Theory Before stating the normalization algorithm it will be necessary to define two terms. Firstly, we will henceforth refer to illustrations of sets of functional dependencies of the form of Figures 1-4 as dependency diagrams, and note that they may be created applying the following rules to an irreducible set of functional dependencies recursively: 1. No left-hand side of a functional dependency may appear in a dependency diagram more than once. 2. No right-hand side of a functional dependency may appear in a dependency diagram more than once. 3. Any set of nontrivial transitive functional dependencies of the form {A → B, B → C} is represented as A → B → C. Secondly, consider some relation R and some set L = {A1,...,An}, where A1,...,An are attribute sets such that Ai ⊂ R and Ai transitively or nontransitively functionally determines Aj for every Ai,Aj ⊂ L. We may then define L as a loop, and note that it may easily be proven that if two loops L1 and L2 exist such that L1 ∩ L2 ≠ ∅, then L1 ∪ L2 is also a loop.

4. Algorithm Given a relation R under a set of functional dependencies S, we may normalize R by applying the following algorithm: 1. Compute an irreducible cover S’ of S. 2. Draw a dependency diagram for S’. 3. For each loop, draw a circle containing that loop and any attribute set nontransitively dependent on any attribute set in that loop. 4. For each nonloop attribute set that nontransitively determines one or more attribute sets, draw a circle containing that attribute set together with the attribute sets thus nontransitively determined. 5. For each circle created in the previous two steps, create a relation whose attributes are all the attributes contained in the attribute sets contained in that circle. 6. For each candidate key of R: If any of the attributes of that candidate key do not appear in the relations created in the previous step, create an additional relation whose attributes are the attributes of that candidate key. 7. Remove any redundant relations from the decomposition.

We will now give two more examples: 1. Let us consider a relation R under a set of functional dependencies S such that R = R(Employee_id, Name, SSN, Title, Duties, Years_of_service, Compensation), S = {E→NSTY, S→ENTY, T→D, TY→C}. One possible irreducible cover of S is S’ = {E→NSTY, S→E, T→D, TY→C}, which results in the dependency diagram in Figure 5. Examining this diagram, we notice one loop, which results in the circle in Figure 6, and two nonloop attribute sets that nontransitively determine other attributes, resulting in the additional circles in Figure 7. From the contents of these circles, we create three relations

Figure 5.

Figure 6.

Figure 7.

R1(Employee_id, Name, SSN, Title, Years_of_service), R2(Title, Duties), R3(Title, Years_of_service, Compensation). Since the only primary key attribute of R is contained in R1, and since no redundant relations exist, we are finished. 2. The algorithm is helpful even when no functional-dependency preserving BCNF/3NF decomposition is possible. Let us consider a relation R under a set of functional dependencies S such that R = R(Name, Residence_or_business, Address, City, State, Zip), S = {NR→AZ, ACS→Z, Z→CS}. Since S is already irreducible, we create a dependency diagram (Figure 8) and circle the nontransitive dependencies (Figure 9), resulting in relations R1(Name, Residence_or_business, Address, Zip), R2(Address, City, State, Zip), R3(Zip, City, State).

Finally, since R3 ⊂ R2 and is thus redundant, we drop R3, making our final decomposition R1(Name, Residence_or_business, Address, Zip), R2(Address, City, State, Zip), with R1 in 3NF and R2 in 2NF, and the redundancy of the original relation greatly reduced.

Figure 8.

Figure 9.

5. Results and Discussion In addition to general relational database design, the algorithm presented in this paper is wellsuited to teaching normalization in college-level database courses. Students frequently have great difficulty learning normalization, in part because they often lack the mathematical background necessary to give them an intuitive understanding of the processes involved. By enabling students to draw a picture illustrating the normalization process, normalization is thus made less abstract and more concrete, and accordingly easier to learn. Possible future research directions include expanding this algorithm to work with higher normal forms, and with other types of databases.

6. Conclusions In this paper we have presented a graphical normalization algorithm for relational databases, demonstrated its efficacy, and suggested applications and research directions for that algorithm.

7. References 1. Date, C. J., An Introduction to Database Systems, Eighth Edition, Addison Wesley Longman, Inc., 2004. 2. Elmasri, R., and Navathe, S., Fundamentals of Database Systems, Third Edition, Addison Wesley, 2000. 3. Mitrovic, A., “NORMIT: a Web-Enabled Tutor for Database Normalization,” Proceedings of the International Conference on Computers in Education (ICCE' 02), 2002. 4. Sanders, G., and Shin, S., “Denormalization Effects on Performance of RDBMS,” Proceedings of Hawaii International Conference on System Sciences, 2001.

A Graphical, Functional-Dependency Preserving ... - CiteSeerX

A Graphical, Functional-Dependency Preserving ... - CiteSeerX

Suggest Documents

Session GRAPHICAL PROGRAMMING: A VEHICLE FOR ... - CiteSeerX

Iamascope: A Graphical Musical Instrument - CiteSeerX

Symmetry-preserving observers - CiteSeerX

Investigation of a Graphical CONOPS Development ... - CiteSeerX

RcmdrPlugin.temis, a Graphical Integrated Text Mining ... - CiteSeerX

EasyModeller: A graphical interface to MODELLER - CiteSeerX

A Graphical Tool for Proving Progress - CiteSeerX

Pad++: A Zoomable Graphical Interface System - CiteSeerX

Exploring Graphical Feedback in a Demonstrational ... - CiteSeerX

A Semantically-Rich, Graphical Environment for ... - CiteSeerX

Preserving Causality in a Scalable - CiteSeerX

A Liouville-operator derived measure-preserving ... - CiteSeerX

A New Framework for Privacy Preserving - CiteSeerX

A Schedulability-Preserving Transformation Scheme from ... - CiteSeerX

Preserving a balanced CSIRT constituency - CiteSeerX

Graphical Table of Contents - CiteSeerX

POLYNOMIAL PRESERVING RECOVERY FOR ... - CiteSeerX

Privacy-preserving demographic filtering - CiteSeerX

TWOoDIMENSIONAL SELF-PRESERVING TURBULENT ... - CiteSeerX

Quality-Preserving Image Downsizing - CiteSeerX

Distance Preserving Graph Simplification - CiteSeerX

Orthogonal Neighborhood Preserving Projections - CiteSeerX

Liveness-preserving Simulation Relations - CiteSeerX

Graphical Programming (1) Graphical Programming (2) Graphical ...