Business Component Identification- A Formal Approach Hemant Jain [email protected] University of Wisconsin – Milwaukee Naresh Chalimeda Navin Ivaturi Balarama Reddy Tata Consultancy Services

Abstract 2. Business Component Fabrication Component Based Software Development is carried out in two phases: Component Building and Application Assembly. The key to building business components is a formal approach for identifying the components. This paper describes such an approach, which assists in identifying the components from an Analysis Level Object Model, representing a business domain. The approach makes use of a clustering algorithm, certain constraints, a predefined rule and a set of heuristics. The approach has been implemented in a tool named ‘CompMaker’ and was used for identifying components for an auto insurance claims domain.

1. Introduction Component Based Software Development (CBSD) is likely to revolutionalize the process of building applications. It advocates an approach whereby applications would be assembled from pre-built parts known as business components. A business component is the software implementation of an autonomous business concept or business process [3]. Thus in CBSD the process of building a business application can be considered to be consisting of two stages--Component Fabrication (building the Business Components) and Application Assembly (building a Business Application from Components). The Component Fabrication stage of CBSD consists of various phases: Domain Analysis and Modeling, Component Identification, Component Design & Implementation, Acceptance and Roll Out & Deployment. This paper focuses on the ‘Identification Phase’ of Business Component Fabrication. The Component Identification phase groups closely related classes of a business domain, into components.

The goal of the fabrication process is to design business components that can be reused within the same domain and may possibly be reused across domains. The challenge for the designer is to identify components that can be developed in cost effective manner, are suitable for reuse, easy to assemble into applications, easy to maintain and provides capability to customize end application by proper selection and assembly of components. Identifying reusable artifacts is recognized as one of the greatest difficulties in classical software reuse [1]. Although the design issues of traditional reusable software artifacts such as code are discussed in the literature [8], the design issues of reusable business components are not adequately addressed [7,11]. Business components vary from traditional software artifacts, and therefore the design process must account for those differences. For instance, traditional reusable artifacts (e.g., code segments, objects, etc.) are mostly fine-grained and portray a low-level technical-oriented representation of the domain. Components on the other hand are more coarse-grained and are intended to provide a high-level business-oriented representation of the domain. The fine-grained technical-oriented nature of traditional reusable artifacts such as objects prevents managers from working with them effectively. However, the coarse-grained business-oriented approach in components allows managers to identify the components that satisfy their business requirements, and subsequently assemble them into full-scale business applications. In addition to granularity, the following key differences between components and traditional reusable artifacts have been identified [2,6,11]: 1. A component is a self-contained executable program that provides a specific service.

2. A component has an interface, which is used to communicate with other components. 3. A component could be used in a context that is unanticipated by its initial designers. Hence, the design of components requires a unique perspective. This paper presents a formal approach for business component design. The approach provides support to component designer in making trade-off between multiple conflicting managerial goals such as reduced development cost, increased reusability identified above. Next section describes a formal approach for component identification.

(agglomerations), initially of individual entities (classes) and later of clusters formed during the previous stages. The classes having the highest relationship strengths are grouped first. The process continues until a cut-off point is reached. The process of computing relationship strength is described next.

3.2. Computing Class Relationship Strength

This is a crucial phase in the overall Component Fabrication process. The approach proposed for identifying business components uses an analysis level domain model as input. We assume that the domain modeling has been done using an object-oriented approach. Thus, the domain model represents significant object classes (using UML notations), the structural relationships between object classes, use cases and sequence/interaction diagrams representing the dynamic relationship between the classes. A clustering approach is used to obtain an initial set of components. Consideration of super type subtype relationships and a set of heuristics enhance and refine the solution obtained from the clustering algorithm. The alternative solutions are evaluated based on the managerial goals measured in terms technical characteristics such as coupling, cohesion, complexity etc. [12]. The approach is described in detail in next sub-sections.

The clustering algorithm groups the classes on the basis of the strength of relationships between classes. For computing strength of relationships between classes, static and dynamic relationships are used. Static relationships [9] are computed based on the associations between classes and the dynamic relationships are computed based on use cases and sequence diagrams. The static relationship represents the way various classes are related to each other. The use of static relationship in the clustering process ensures that only the related classes are clustered together. On the other hand dynamic relationship represents the way various classes interact through messaging to support various business processes. Use cases and the corresponding sequence diagrams are used as a basis for computing dynamic relationship between classes. Use cases are assigned relative weights based on their importance to the domain. The importance to the domain can be based on the criticality of the business process supported by the use case, frequency or any other considerations. The total relationship strength between a pair of classes is computed as follows: Consider a scenario in which Class i and Class j are structurally related and are used in one or more use cases, representing dynamic relationship between them.

3.1. The Clustering Algorithm

The strength of the static relationship (Sij) between classes i and j can be defined as:

3. Business Component Identification

The process of component identification begins by grouping related classes of an analysis level domain model. A clustering approach is used to arrive at the initial grouping. Clustering approaches can be classified as hierarchical or non-hierarchical. Hierarchical clustering techniques are further divided into Agglomerative and Divisive techniques. An Agglomerative method involves a series of successive mergers whereas a Divisive method involves a series of successive divisions [5]. The approach proposed here makes use of a Hierarchical Agglomerative clustering algorithm for grouping the classes of the analysis level domain model. The strength of the relationships (static and dynamic) between the classes of the domain model is used as the basis for clustering the classes. The technique proceeds through a series of successive binary mergers

Sij = Ws × Nij Where Ws = the static association weight. Nij = the total number of associations between class i and j. The strength of the dynamic relationship (Dij) between classes i and j is defined as: Dij =

™(Upi * Upj * Wp * Vijp)


Where, P = Set of use cases Upi = 1 if use case p need class i 0 if use case p does not need class i Wp = Weight assigned to use case p

Vijp = Number of messages between class i & j in use case p The Sij and Dij are scaled on a 0 to 1 scale. The designer has an option of assigning relative importance (RI) to static and dynamic relationships. The total strength (TSij) of the relationship between two classes is computed as: TSij = ( RIs * Sij + RId * Dij) Where, RIs + RId = 1.0 Another factor called ‘Threshold Limit’ of the relationship strengths is also used during the clustering process. Threshold Limit denotes the stage at which the clustering algorithm puts an end to the series of successive mergers of classes. The designer can assign a value to this factor, thereby indicating the point at which the algorithm needs to stop the clustering process. The clustering process can also be constrained by defining the ‘Minimum number of components desired’ and the ‘Maximum number of classes that are allowed in a component’.

3.3. Enhancement of the Clustering Solution Placing the classes that are related through ‘inheritance’ in a single component can enhance the component identification solution obtained from the clustering algorithm. Taking the technical characteristics of the component design into consideration, one has to strive for tight cohesion within a component and loose coupling between components . Cohesion refers to the strength of association between elements (classes) in a component [12]. On the other hand coupling refers to the extent to which classes within the component relate to other classes, which are not in that component [12]. If there is inheritance between classes, then it is more appropriate to place those classes in the same component because of the strong relationship (cohesion) between them. If the classes related through inheritance were distributed across components, then it would result in an increase in dependency (coupling) between components. The ideal scenario is one in which the cohesion within a component is maximized and the coupling between components is minimized. In this approach we replicate the super classes by adding it to the components containing one or more of its sub-classes.

3.4. Evaluation of Solution Vitharana (2000) identified five managerial goals of the component developer (cost effectiveness, ease of

assembly, customization, reusability and maintainability) and five technical features of component design (coupling, cohesion, number of components, size of component and complexity) that are closely related to the managerial goals. He identified the relationship coefficients between the technical features and managerial goals from a survey. We adopt Vitharana’s model for evaluating the component identification solutions.

3.5. Heuristics The Component Identification approach makes use of a set of heuristics for further refining the initial solution obtained from the clustering algorithm. The following two types of heuristics are supported: • Automated • Manual Automated Heuristics: These heuristics are performed by the system when the designer opts for them. Amongst the automated heuristics, the various options available to the designer are: Add heuristics, Move heuristics and Exchange heuristics. Each of these heuristics is described here. Add heuristics: In this type of heuristics, redundant assignment of classes to multiple components is used to arrive at a more desirable solution. At each iteration, a class is added to a component and the solution is evaluated in terms of the managerial goals associated with it. Since the evaluation model contains multiple conflicting objectives a set of non-dominated solutions are generated and presented to the designer. The process is similar to the one used in [4]. Figure 1 depicts an iteration of add heuristics.

Figure 1. Add Heuristics Move Heuristics: In this type of heuristics, a class from a component is moved to another component, during iteration. The managerial goal values are computed after every iteration. As in the case of Add heuristics, only the non-dominated solutions are displayed. Figure 2 depicts an iteration of Move heuristics. During the iteration, Class A is moved from Component 1 to Component 2.

Unlike Add heuristics, classes are not redundantly assigned to components. Exchange Heuristics: This heuristic operates by making even exchanges of classes between components. During an iteration of Exchange heuristics, a class from a component is exchanged with a class from another component. Figure 3 depicts the exchange of Classes A and X between Components 1 and 2 respectively.

Figure 2. Move Heuristics

Figure 3. Exchange Heuristics Manual Heuristics: Unlike automated Heuristics, which are performed by the system, manual heuristics are carried out by the designer (or any other person who possesses the domain knowledge). If the designer feels that a particular class is more appropriate in another component, he/she can move the class to that component The manual heuristic is designed to provide opportunity for fine tuning the components by the designer.

4. Implementation of the Approach: The research team at the University of Wisconsin, Milwaukee, has used the above approach to build a component identification tool, ‘CompMaker’. The research program is a joint collaboration between Tata Consultancy Services (TCS), Asia’s largest software

consultancy firm, and the University of Wisconsin, Milwaukee.

4.1. The CompMaker Tool The CompMaker is a Java based application, built using JBuilder Version 3.5, in a Windows NT environment. The steps involved in using this tool are briefly described below: • Initially a UML based Object Model representing the domain under consideration is developed. This model comprises of use case diagrams, sequence diagrams and class diagrams. • The model data are extracted by executing a script in the object-modeling tool (Adex Modeling Framework) [10]. • Once the component identification tool opens the model, it displays all the use cases that are present in the model and allows the user to assign the weight to the use cases • Other weights required by the model and constraints are then specified. • Clustering algorithm is then run and the initial solution evaluated in terms of managerial goals is displayed. • The user can choose to apply automated heuristics to further refine the initial solution obtained from the clustering algorithm. • Once a set of alternate non-dominated solutions is obtained, the user can modify the solution manually. • Any solution thus obtained can be saved and retrieved at a later stage.

4.2. Application of the approach on an AutoInsurance Claims System The component identification approach was applied to Auto-Insurance claims domain. A team of domain experts from TCS developed the object model for this domain. The model contained 57 classes and 8 use cases. Each use case was assigned weight based on its importance as determined by TCS expert. Equal weight was assigned to static and dynamic relationship. The cluster constraints, which are minimum number of components and maximum number of classes present in a component, were assigned values of 20 and 3 respectively. Clustering algorithm was then executed. The enhanced version of the initial component identification solution was displayed after incorporating the ‘inheritance rule’ described above. The solution contained 26 components and the managerial goals represented on a ten-point scale were computed (Higher

values are more desirable except development effort where lower values are better). The values of the goals obtained are shown in the first row of Table 1. Heuristics were used to further refine this solution. Exchange heuristics was first applied. The second row of Table 1 shows the solution obtained. We see that the values are better in terms of development efforts. Please note that the managerial goal values provide the relative comparison between solutions. Thus, all other values being equal a solution with reusability of 8 is better than a solution with reusability of 7. The solution was later subjected to move heuristics. This resulted in a set of non-dominated solutions. The selected solution contained the values, which are shown in the third row of Table 1. In this solution one can see that though the cost has increased a little bit, all the other values have improved. Table 1. Component Identification Solutions Customization























Solutions Å

Dev. Efforts

East of Assembly

Managerial Goals Æ

Initial Solution Exchange Heuristics Move Heuristics Add Heuristics

In the next step, add heuristics were applied. From the resulting set of non-dominated solutions, a preferred solution was selected, which had the following values for managerial goals (shown in row four of Table 1). This solution shows improvement on the basis of two factors, cost and ease of assembly as opposed to maintainability. Overall the solution seems to have improved. Manual heuristics were performed by moving classes from one component to another component where it seemed more appropriate. This procedure was repeated by specifying a different set of values for the clustering constraints, minimum number of components as ‘10’ and maximum number of classes as ‘6’. The solution yielded 20 components. The TCS expert felt that the final solution obtained was a good satisfactory design. They found the tool useful.

5. Conclusion

There is a dearth in the availability of literature that discusses a standard methodology for identifying components from a set of classes. The component identification approach discussed in this paper represents a formal approach to identifying components. Such an approach is likely to enhance the subsequent phase of component assembly.

