Normalization of object-oriented design

0 downloads 0 Views 179KB Size Report
the theory of normalization to object-oriented design and shows that, if applied ..... this problem and is free of the unnecessary complexities associated with the ...
Normalization of object-oriented design Fakhar Lodhi* and Hassan Mehdi** *Software Engineering Research Center National University of Computer and Emerging Sciences, Lahore. **Xavor Pakistan (Pvt.) Ltd., Lahore [email protected]

Abstract – The object-oriented approach does not follow any formal design process and is mostly ad hoc in nature. This makes it more of an art than a science. The quality of the resultant design therefore depends to a large extent on the skills of the individual designer and cannot be evaluated easily. We believe that the theory of normalization, available to the designer of relational database systems, can also be applied to object-oriented design. This paper outlines the process of application of the theory of normalization to object-oriented design and shows that, if applied properly, it rids the design process from the aforementioned problems and yields a better object model by bringing formalism and taking a scientific approach. Keywords: Functional dependency, normal forms, object- oriented design, relational model 1 INTRODUCTION Semantic data models in the form of Entity/Relationship (E/R) diagrams are used as the de facto standard for database design [1]. Object-oriented design and data modeling are different in the sense that the data models focus on the structural components while the objectoriented models also include the behavioral abstractions in the form of operations embedded within types [2] [3] [4]. It has however been noted that entities in the E/R model are not very different from objects in the object model and the distinction between the two models is not as sharp as it might seem [5]. However, typically, realization of these models into actual implementation is very different. From a process' perspective, in the case of database design, the model is analyzed through the formal process of normalization and then translated into physical implementation [6] [7] [8]. The object-oriented approach on the other hand does not follow any formal process and is mostly ad hoc in nature [9]. From an OO point of view, unrestricted functional dependencies may negate the notion of sufficiency and delegation and introduce functional blobs, thus negatively impacting the quality of the design [2] [10]. In the theory of normal forms, the first normal form (1NF) takes a special place; a database model cannot be relational unless it is in 1NF. The first normal form

[email protected]

insists that attribute value entries are atomic. That is, there are no repeating groups or lists and values must be primitive data types. The first normal form is fundamental to the notion of relational databases since it imposes a restriction of defining/storing data in the form of a set of two-dimensional tables. The rest of the normal forms (2NF to 5NF) deal with the issues of data redundancy and update anomalies and a model that is not fully normalized, suffers from both these problems [6] [7] [11] [12] [13]. Central to the theory of normal forms is the concept of functional dependencies [6] [7] [11] [12] [13]. Functional dependencies express concrete relationships in the real world and depict semantics relationship among data. If not managed properly, they introduce data redundancy and update anomalies. Normal forms provide the application designers with a formal framework for analyzing data along with their functional dependencies. Data normalization is nothing but a series of tests on data. Given the functional dependencies, these tests can be carried out on individual data groups and help the application designer to normalize data to any degree. When a test fails, the group of data violating that test must be decomposed into smaller subgroups that individually meet the normalization test, ultimately producing a model which is freer of data redundancies and update anomalies. We believe that these normal forms are also applicable to the object-oriented design. In Section 2 we outline a normalization mechanism for object-oriented design and demonstrate that the application of normal forms (2NF5NF) eliminate data redundancy and functional blobs and produce better object-oriented design. 2 NORMALIZING THE OBJECT-ORIENTED DESIGN From an object-oriented perspective, 1NF is too restrictive as it forbids the storage of complex objects, so that the abstract data types cannot be dealt with. However, we believe that other normal forms are also applicable to the object-oriented design.

The proposed process for normalization of the object model is similar to the normalization process in the relational domain during which unsatisfactory classes are decomposed by breaking up their attributes as well as behavior into smaller classes that possess desirable properties. The following subsections elaborate this process of applying 2nd to 5th normal forms to the object model.

Order -ID -date -totalPrice -lineItems -customerID -customerName -customerAddress -customerPhone -customerFax +getID() +getDate() +getTotalPrice() +getLineItems() +getCustomerID() +getCustomerName() +getCustomerAddress() +getCustomerPhone() +getCustomerFax() +setID() +setDate() +setTotalPrice() +setLineItems() +setCustomerID() +setCustomerName() +setCustomerAddress() +setCustomerPhone() +setCustomerFax()

2.1 Cohesion and Second Normal Form (2NF) One of the fundamental challenges in object-oriented design is the design of cohesive abstractions or objects. An abstraction is considered to be highly cohesive if all the information and functionality contained by this abstraction is related to each other [2]. This means that all the attributes of an object should not only be needed for the definition of the abstraction but they should also be directly related to the object. From a relational perspective, a relation R is said to be in Second Normal Form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary attribute of R [7] [8]. In other words, in order for a relation to be in 2NF, all its attributes should be dependent upon the primary key. It is easy to see that this in fact is also the requirement for cohesion. Therefore, applying the principles of 2NF would yield cohesive objects. This is illustrated by the following example. The Order class shown in the Figure 1 represents an Order entity that contains the attributes and behavior of a specific order. From the functional dependency point of view, we can easily detect that the attributes customerName, customerAddress, customerPhone, customerEmail, customerFax are not functionally dependent upon a specific Order and hence this class is not in 2NF. The application of second normal form would move the non-prime partially dependent attributes into a new class Customer where they would be fully dependent upon the customerID as shown in Figure 2. Hence, generating a model with more cohesive classes. It is pertinent to note that, unlike a relation, the object has behavior embedded within their types, so the methods that manipulate the partially dependents attributes would also be moved to the newly created class. 2.2 Third Normal Form (3NF) and Functional Blobs Two of the most important principles of object-oriented design are the notion of information hiding and delegation of responsibility. These principles are overlooked in many designs. A typical example of such a problem is when an object explicitly maintains information about a relationship among some of its parts instead of delegating this responsibility to the appropriate sub-objects. This gives rise to manager

Figure 1. A non-cohesive class

Customer Order -ID -date -totalPrice -lineItems +getID() +getDate() +getTotalPrice() +getLineItems() +setID() +setDate() +setTotalPrice() +setLineItems()

1

1

-ID -name -address -phone -fax +getID() +getName() +getAddress() +getPhone() +getFax() +setID() +setName() +setAddress() +setPhone() +setFax()

Figure 2. Order Class after applying 2NF objects or functional blobs [14]. Application of 3rd normal form rectifies this problem. The third normal form is based on the concept of transitive dependency [7] [8]. A relation R is in 3NF if it is in 2NF and no non-prime attribute of R is transitively dependent upon the primary key. This basically means that if an object is explicitly maintaining information about a relationship among some of its parts instead of delegating this responsibility to the appropriate subobjects then it is not in 3NF. To understand this, let us consider the model of Figure 3. This object model has three classes with the following semantics: the Project class represents a project, the Version class maintains information about a specific version of a project, and the

Project

ProjectVersion

-ID -name -manager -completionDate +getID() +getName() +getManager() +getCompletionDate() +setID() +setName() +setManager() +setCompletionDate()

-mapID -projectID -versionID +assignProjectVersion() +getProjectVersion()

*

1

Project

1

*

Version

Version

-ID -name -description +getID() +getName() +getDescription() +setID() +setName() +setDescription()

-ID -name -description +getID() +getName() +getDescription() +setID() +setName() +setDescription()

ProjectVersion

1 *

-mapID -projectID

Figure 4: Model after applying 3NF

Figure 3: A functional blob ProjectVersion class maintains information different projects and their versions.

* 1

-ID -name -manager -completionDate -version +getID() +getName() +getManager() +getCompletionDate() +getProjectVersion() +setID() +setName() +setManager() +setCompletionDate() +assignVersion()

about

From the functional dependency point of view, Version is functionally dependent upon the ProjectVersion class as well as the Project class. A specific version object is actually related only to a particular project object and ProjectVersion is just acting as a middleman. It is easy to see that this model is not in 3NF. As a consequence, in order to retrieve version information of a specific project, we would use the ProjectVersion class instead of Project class and hence would create a functional blob. Application of third normal form would remove the association between the ProjectVersion and Version class and would create a new relationship between the Project and Version thus removing the transitive dependency. The movement of the version data from the ProjectVersion class would also result in the behavior movement into the Project class as shown in Figure 4. It is now evident that the ProjectVersion class is now of no use as it is not providing any meaningful functionality. As it was previously meant to maintain a map of project and its version, after the application of third normal form the project itself would maintain its version. The ProjectVersion class now would have no behavior. So in the second iteration we would remove the useless ProjectVersion class and get the model as shown the in Figure 5. From the above example we can see that the application of third normal form not only avoids functional blob but it also helps to eliminate useless classes.

Project Version -ID -name -description +getID() +getName() +getDescription() +setID() +setName() +setDescription()

*

1

-ID -name -manager -completionDate +getID() +getName() +getManager() +getCompletionDate() +setID() +setName() +setManager() +setCompletionDate()

Figure 5: Model after removing VersionManager 2.3 The Fourth and Fifth Normal Forms Both the 4th and 5th normal forms deal with the concept of multi-valued dependency (MVD) [12]. The multivalued fact may correspond to a many-to-many relationship, as with course and students, or to a manyto-one relationship, as with the employees of a company (assuming that a person works for only one company). Under the fourth normal form, a record type should not contain two or more independent multi-valued facts about an entity. In addition, the record must satisfy third normal form. A relational schema R is not in 4NF, if it represents two independent multi-valued relationships. Apart from the issue of redundancies, the main problem with violating fourth normal form is that it leads to uncertainties in the maintenance policies, eventually resulting in complex algorithms [15]. The only difference between the 4th and 5th normal forms is that in the case of 5th normal form there exists a symmetric constraint in which all the classes are related with one another. In the absence of such a constraint, a

XYZ

Z

X

*

*

Y

Z * *

Z

* 1 * 1

X

*

X

1

Y

* *

*

Y

Figure 6: Modeling MVDs using n-ary association classes.

Figure 7: Model in 4th normal form – Y and Z are independent.

Figure 8: Model in 5th normal form – X, Y and Z are symmetrically related to each other.

record type in fourth normal form is always in fifth normal form. One advantage of fifth normal form is that certain redundancies can be eliminated.

In an object-oriented model, sometimes, the multivalued relationship among more than one class is modeled using an n-ary association class. An example of this type of association class is the XYZ class in Figure 6. This class is nothing but a map of n-ary relationship among objects of X, Y, and Z classes. This situation may demand the application of 4th or 5th normalization. If applicable, the 4th and 5th normalizations would result in models of Figure 7 and 8 respectively.

Movie

*

Person

Figure 9: Independent MVDs.

Book

These concepts are elaborated with the help of following examples. 2.3.1 The Fourth Normal Form Let us consider the following problem: we have to make a system in which we keep track of the books and movies owned by a person. Suppose that an association class Catalog is used to manage this information as shown in Figure 9. This solution is not in 4th normal form because the Movies and the Books are multi-valued dependent upon the Person and are not related to each other. Applying the 4th normal form produces the solution of Figure 10 which is clearly a better solution to this problem and is free of the unnecessary complexities associated with the previous solution. 2.3.2 The Fifth Normal Form Let us consider Date’s classical problem of Suppliers, Parts, and Projects where a supplier supplies parts, the parts are used by different projects and the supplier supplies the parts to some specific projects [8]. This relationship between the Suppliers, Parts, and Projects is

Book

*

1

If none of the above conditions for MVDs were present, then no normalization would be required. That is, if the classes involved in MVDs are asymmetrically related then no normalization could be applied.

Catalog

* 1 Movie

* 1

Person

Figure 10: Model after applying 4NF. modeled by the SPJ class in Figure 11. Let us assume that an additional symmetric constraint is added as follows: if a supplier sells a certain part and that part is used in a certain project, then the supplier supplies to that project. This means that the model is not in 5th normal form. Application of the 5th normal form would result in the removal of the n-ary relationship among the Project, Parts, and Supplier classes and the required relationship would now be modeled with the help of three binary relationships between Supplier-Project, Supplier-Part, and Project-Part. The new model is shown in Figure 12.

SPJ

Supplier

Supplier

*

*

Project

Project * *

*

*

Part

*

*

Part *

Figure 12: Model after applying 5th NF

Figure 11: MVD with symmetric relationship [3] 2.3.3 MVDs with no normalization Let us now consider the case where the classes involved in MVDs are asymmetrically related and hence the model does not need 4th or 5th normalization. This can be elaborated with the help of the example of Jockey, Horse, and Race problem [16]. In this case, we need to represent the jockey riding a horse in a race. This means that all three of them are related but the relationship is not symmetric as a jockey can ride the same horse in different races and hence a jockey-horse combination does not uniquely identify a race. In order to be able to answer various queries, all three pieces of the data (jockey, horse, and race) need to be stored together. Therefore, in this case, the n-ary association class tying the three pieces of information has been used appropriately and does not need any normalization. 3 SUMMARY AND CONCLUSION In this paper we have shown that the principle of normalization can also be applied to object-oriented design. The basic problems of cohesion and functional blobs in an object model can be addressed by ensuring that the model satisfies 2nd and 3rd normal forms respectively. Furthermore, it was also shown that the optimality of design for modeling n-ary relationship can be checked by applying the tests for 4th and 5th normal forms. This framework thus provides us with a formal mechanism of methodically analyzing an object-oriented design and improving its overall quality by applying these normalizations in a systematic and scientific manner. REFERENCES [1]

[2]

Chen, P., The Entity-Relationship Model: Towards a Unified View of Data, ACM Transactions on Database Systems 1(1), 9-36, 1976. Booch, G., Object Oriented Design with Applications, 2nd Ed., Benjamin/Cummings, 1991.

[4] [5]

[6] [7] [8] [9] [10] [11] [12] [13]

[14] [15]

[16]

Rumbaugh, J. et al, Object Oriented Modeling and Design, Prentice Hall, 1991. Rumbaugh, J., Booch, G., and Jacobson, I., The Unified Modeling Language Reference Manual, Addison-Wesley, 1999. King, R., My cat is Object-Oriented in Kim, W., and Lochovsky, F. H., (Eds), Object Oriented Concepts, Databases and Applications, AddisonWesley, 1989. Codd, E., A Relational Model of Data for Large Shared Data Banks, CACM 13(6). 377-387, 1970. Codd, E., Further Normalization of the Database Relational Model, in R. Rustin (ed.), Database Systems, Prentice Hall, 1972. Date, C. J., An Introduction to Database Systems, 7th Ed, Addison-Wesley, 2000. Graham, I., Object Oriented Methods: Principles and Practice, 3rd Ed, 2001. Coad, P., North, D., and Mayfield, M., Object Models: Strategies, Patterns, and Applications, Prentice Hall, 1997. Fagin, R., A Normal Form for Relational Databases that is based on Domains and Keys, ACM Trans. Database Syst. 6(3)(1981) 387-415. Fagin, R., Multivalued Dependencies and a New Normal Form for Relational Databases, ACM Transactions on Database Systems 2(3), 1977. Fagin, R., Normal Forms and Relational Database Operators, ACM SIGMOD International Conference on Management of Data, 1979 Riel, Arther J., Object-oriented design heuristics, Addison-Wesley, 1996. Kent, William, A Simple Guide to Five Normal Forms in Relational Database Theory, Communications of the ACM 26(2), Feb. 1983, 120-125. Rumbaugh, James, OMT insights: perspectives on modeling from the journal of object-oriented programming,, SIGS, 1996.