A Cohesion Measure for Object-Oriented Classes - CiteSeerX

50 downloads 4711 Views 451KB Size Report
Sep 16, 2000 - Department of Electrical Engineering & Computer Science ... impact on the cohesiveness of a class, and is defined in terms of the degree of the ...
A Cohesion Measure for Object-Oriented Classes Heung Seok Chae, Yong Rae Kwon, and Doo Hwan Bae Division of Computer Science Department of Electrical Engineering & Computer Science Korea Advanced Institute of Science and Technology (KAIST) mailing address : Division of Computer Science Department of Electrical Engineering & Computer Science Korea Advanced Institute of Science and Technology 373-1, Kusong-dong, Yusung-gu, Taejon 305-701, Korea e-mail address : fhschae,kwon,[email protected] phone number : +82-42-869-4375 fax number : +82-42-869-3510

1

A Cohesion Measure for Object-Oriented Classes Heung Seok Chae, Yong Rae Kwon, and Doo Hwan Bae Division of Computer Science Department of Electrical Engineering & Computer Science Korea Advanced Institute of Science and Technology 373-1, Kusong-dong, Yusong-gu, Taejon 305-701, Korea fhschae,kwon,[email protected] September 16, 2000

SUMMARY

In object-oriented systems, cohesion refers to the degree of the relatedness of the members in a class and strong cohesion has been recognized as a highly desirable property of classes. We note that the existing cohesion measures do not take into account some characteristics of classes, and thus often fail to properly reect the cohesiveness of classes. To cope with such a problem, we propose a new cohesion measure where the characteristics of classes are incorporated. Our cohesion measure takes into account the members that have actually impact on the cohesiveness of a class, and is dened in terms of the degree of the connectivity among those members. We develop a cohesion measurement tool for C++ programs and perform a case study on a well-known class library in order to demonstrate the eectiveness of our new measure. By performing principal component analysis, we also demonstrate that our measure captures a new aspect of class properties which is not captured by the existing cohesion measures. Keywords: object-orientation software metric cohesion

1 Introduction Cohesion, originated in structured design, refers to the relatedness of the elements in a module32]. A highly cohesive module is one whose elements have a close relationship among them in order to provide the sole functions of the module. On the contrary, a low cohesive module has some elements that have little relationships with others, which indicates the 1

module seems to provide several unrelated functions. It is widely accepted that the higher the cohesion of a module is, the easier the module is to develop, maintain and reuse, and the less fault prone it is. Also, there are empirical evidences that support the importance of cohesion in structured design9, 10]. Therefore, virtually every software engineering text describes a strong cohesion as a highly desirable property of a module. Many cohesion measures have been proposed to quantify the amount of cohesion in a module and used to help developers design modules with greater cohesion. The object-oriented paradigm has been introduced throughout the software industry with an expectation that software developed using this paradigm will be reliable, easier to maintain, and more likely to be reused. Various claims have even been made regarding the object-oriented paradigm as being the silver bullet for solving the software crisis15, 16]. Among the fundamental concepts of the object-oriented paradigm, the notion of a class is the major factor that encourages such an optimistic expectation of the object-oriented paradigm. As the abstract representation of objects in a problem domain, classes provide a consistent medium such that they minimize the gaps between successive steps during the entire software development process. In addition, classes serve as a unit of encapsulation that is, instances of a class can be manipulated only through the interface dened in the class. Therefore, the internal implementation of classes can be changed without aecting any clients as long as the new implementation conforms to the interface. In other words, compatible changes can be made safely on classes, which facilitates program evolution and maintenance31]. It is well known that all the advantages provided by the object-oriented paradigm are mainly based on the notion of class. Thus, all object-oriented design methods emphasize the importance of the correct identication of classes from the application domain, and developers spend signicant time and eort to identify the essential classes relevant to the system. Although each object-oriented design method provides various guidelines to identify a set of classes from the application domain, there is a general agreement that a class should be created to abstract the state and the behavior of the similar objects by its members(i.e., instance variables and methods). However, poorly designed classes may be produced from the inappropriate use of object-oriented concepts during analysis and design phases or uncontrolled change during maintenance activities. A class may be poorly designed if the class fails to represent the features of the corresponding objects by its members that is, a class has disparate and non-related members, or two or more dierent kinds of objects are captured by one class. As in structured design, the notion of cohesion can be used to identify poorly designed classes. If a class is not the proper representation of some objects, but just a collection of 2

unrelated members, then the relationship among its members may be not strong. Thus, such a class will show a weak cohesion. On the contrary, if a class properly captures the features of objects, all the members are closely related among themselves, which denitely leads to a strong cohesion. Actually, many popular object-oriented methods mention that cohesion can be used as an indicator to well-designedness of classes \A class should not be a collection of unrelated members, but all the members of a class should work together to provide some behaviors of the corresponding objects"30]. \A class with low cohesion would suggest inappropriate or accidental abstraction"4]. Recently much research on cohesion measure has been conducted to quantify the cohesiveness of classes3, 6, 13, 14, 21, 22, 26]. The existing cohesion measures attempt to measure the cohesiveness of a class in terms of the interactions among the methods and the instance variables of the class. However, there are some characteristics of classes that have not been considered in the existing cohesion measures, which can lead to the dierent cohesion values from the general intuition on cohesion. First, the existing cohesion measures do not dierentiate some kinds of methods that are originally designed to show a specic behavior, inherently interacting with only some of the instance variables. We note that the existing measures do not take into account the property of such methods and, thus, fail to properly reect the actual cohesiveness of classes. Second, the existing cohesion measures do not consider the patterns of the interactions among class members, but just count the number of the instance variables referenced by methods or the number of method pairs with shared instance variables. We also note that such simple counting cohesion criteria without taking into account the interaction patterns among class members would lead to a dierent result than expected. In this paper, we propose a new cohesion measure that can cope with those problems by incorporating our observations on the characteristics of classes into the cohesion measure. We present a cohesion measurement tool developed for automating the computation of the various cohesion measures including our new measure and the result of a case study performed on a well-known class library. By performing principal component analysis, we also demonstrate that our new measure captures a new aspect of class properties which is not captured by the existing measures. The remainder of this paper is organized as follows. Section 2 provides an overview of the current research on object-oriented cohesion measures and highlights their weaknesses. Section 3 proposes a new cohesion measure that can resolve the problems in the existing cohesion measures. Section 4 describes the developed cohesion measurement tool and Section 5 presents our empirical study for InterViews system including principal component analysis. Finally, conclusions and future works are presented in Section 6. 3

2 Cohesion Measures for Classes As the object-oriented methodology gains popularity from the industrial and academic settings, research has been carried out on object-oriented cohesion measures. Recently, the major existing cohesion measures were reviewed6]. Table 1 presents abbreviated denitions for these measures. Table 1: The existing cohesion measures Name Denition LCOM113](Lack of Cohesion The number of pairs of methods that share no instance variables in Methods) LCOM214] Let P be the pairs of methods without shared instance variables, and Q be the pairs of methods with shared instance variables. Then LCOM2 = jP j ; jQj, if jP j > jQj = 0, otherwise LCOM326] Consider an undirected graph G where the vertices are the methods of a class, and there is an edge between two vertices if the corresponding methods share at least one instance variable. LCOM3 = j connected components of G j LCOM422] Like LCOM3, where graph G additionally has an edge between vertices representing methods Mi and Mj , if Mi invokes Mj or vice versa. Co22] Let V be the vertices of graph G from LCOM4, and E its edges. Then 1) Co = 2  (jVjEj;j;1)(jV(jj; V j;2) LCOM521] Consider a set of methods fMig(i = 1 : : :  m) accessing a set of instance variables fAj g(j = 1 : : :  a). Let (Aj )Pbe the number of methods 1 that reference Aj . Then, LCOM5P= a 1j1;a m(Aj );m Coh6] A variation on LCOM5. Coh = 1jmaa(Aj ) LCC3](Loose Let NP be the maximum number of public method pairs that is, Class Cohesion) NP = N (N2 ;1) for N public methods. Let NIC be the number of direct or indirect connection between public methods. Then LCC is dened as the relative number of directly or indirectly connected public methods. LCC = NIC NP TCC3](Tight Let NDC be the number of direct connection between public methods. Class Cohesion) Then TCC is dened as the relative number of directly connected public methods. Then, TCC = NDC NP

4

2.1 Special Methods Most of the existing cohesion measures regard the interactions between the members(i.e., methods and instance variables) of a class as the dominant factor in calculating cohesion measures. Some methods in a class are designed to show a specic behavior, inherently interacting with only some of the instance variables in the class. The existing cohesion measures do not consider the nature of such methods, which may result in incorrect evaluation of the cohesiveness. We name those kinds of methods special methods. Access methods, delegation methods, constructors, and destructors can be classied into special methods. An access method is one whose behavior is only to retrieve or update an instance variable. Consequently, it typically references only one instance variable. For example, Figure 1 shows the denition of class Stack. Method GetTop() just retrieves the value of instance variable top and interacts only with top. class Stack f private: int *items

 Natural top  public: // constructor Stack() // destructor Stack() // access method Natural GetTop() // delegation method int IsEmpty()



g

int Pop() void Push(int 

n)

f

top

f

delete

] items 

f

return

top 

f

return

top.IsZero() 

f f

if if

=

0  items

=

new intMAX]



g

g

g g ;;

( ! IsEmpty() ) return itemstop ]  ( top < MAX - 1 ) itemstop ] n 

++ =

g

g

Figure 1: The code for class Stack We believe that these access methods should not reduce the cohesion, because they inherently reference only one instance variable in a class that is, the access methods can complete their behaviors by referencing only one instance variable. In addition, the concept of encapsulation in object-oriented paradigm promotes the use of the access methods in order to provide an access path for the instance variables. Even some object-oriented languages such as Trellis/Owl generate access methods automatically. However, the existing cohesion measures are not concerned with the nature of access methods, which leads to the unexpected reduction of the cohesion6, 11, 12]. For example, 5

these access methods lead to the unintended increase in LCOM1 and LCOM2, and decrease in Co1 and TCC because the access methods increase the number of method pairs with no shared instance variables. While the number of possible interactions will increase by the number of the instance variables, the number of actual interactions increases only by one thus, LCOM5 will increase and Coh will decrease. A delegation method in a class achieves its behavior by delegating a message to another object, especially to an instance variable in the class and, thus, generally has one interaction with the instance variable. For instance, method IsEmpty() in class Stack is a delegation method because it just sends a message IsZero() to instance variable top in order to check the emptiness of the stack. The delegation methods that simply pass through a message to another object turn out to be conventional in object-oriented systems33]. In addition, some of the work on good objectoriented programming style seems to encourage the use of the delegation methods25], and the Law of Demeter that restricts the target object also foster the use of such methods28]. Along with access methods, the delegation method inherently interacts with only some of the instance variables. Therefore, the cohesion of a class should not be reduced by the existence of delegation methods. For example, the single method invocation of IsZero() is sucient for completing the behavior of method IsEmpty(). The existing cohesion measures do not take into account this characteristic of delegation methods. The constructor and destructor in a class have to access the only essential instance variables that are required to initialize and deinitialize its instances respectively. So, they may not inherently access all of the instance variables. For example, destructor Stack only references instance variable item to release the allocated space. Some researchers mentioned the problems with regard to some kinds of special methods. Briand et al.6] mentioned that the access methods articially reduce cohesion, and suggested that the access methods should be excluded in order to resolve those problems. Bieman et al.3] also excluded constructors and destructors in LCC and TCC in order to remove the impact of articial connection by those methods. However, since access methods, constructors, destructors, and delegation methods have no inuence on the cohesiveness, the nature of all these methods should be reected to properly measure the cohesiveness. In this paper, we introduce the notion of special methods which can cope with the problems caused by access methods, constructors, destructors, and delegation methods. By incorporating the notion of special methods into the denition of our new cohesion measure, the new measure can give a more correct and proper measurement of the cohesiveness of a class. 1

The original name of this measure is C.

6

2.2 Cohesion Criteria The existing cohesion measures depend on either the instance variable usage or sharing of instance variables that is, a class is more cohesive when a larger number of its instance variables are referenced by a method(e.g., LCOM5 and Coh), or a larger number of method pairs share instance variables(e.g., LCOM1, LCOM2, LCOM3, LCOM4, Co, LCC, and TCC)6]. We found that those criteria could lead to unexpected results on the cohesion measurement. For example, Table 2 shows four classes A, B , C , and D, and their values of the existing cohesion measures. A rectangle and an oval represent a method and an instance variable, respectively, and an edge denotes an interaction between them. There are two categories with regard to the patterns of the interactions among members the members of each of classes B , C , and D have some interactions among them, but class A contains two components that have no interactions between them. Therefore, from the viewpoint of cohesion, relatedness among the members of a class, classes B , C , and D should be more cohesive than class A. Table 2: Examples for cohesion measures V1

Measure LCOM1 LCOM2 LCOM3 LCOM4 Co LCOM5 Coh LCC TCC

M1

V2

M2

V3

M3

V1

M4

M1

V2

M2

V3

M3

V1

M4

M1

V2

M2

V3

M3

V1

M4

M1

V2

M2

V3

M3

(a) class A

(b) class B

(c) class C

(d) class D

3 0

3 0

0 0  ; 1   ; 1   ; 1 

0 0  ; 1   ; 1   ; 1 

1 3

0

2 2 N/A2

 ;

1 1  0    ;

2 3

2 3

1 2

1 2

1 2 1 2

 ;

 ;

 

1

 ;

 

1 1 

1

1

1 1 2

7

1 3

M4

However, LCOM1, LCOM2, LCOM5, Coh, and TCC cannot dierentiate the cohesiveness of those classes that is, the cohesion values in marks in Table 2 show the identical values for classes A and B . Specially the drawback of LCOM2 has been discussed in several papers6, 12, 20, 22, 23], and furthermore was manifested from the experiment performed by Basili et al.1], where LCOM2 of many classes are set to be zero although dierent cohesions are expected. LCOM3, LCOM4, and LCC have another problem that those measures always assign the maximum cohesion to the classes whose members are related. However, classes whose members are related can show very dierent cohesiveness12, 22]. For example, The members of each of classes B , C , and D in Table 2 have interactions between them that is, they all show the connected interaction patterns. However, the members of class C have tighter interactions than those of class B . Thus, intuitively class C should be more cohesive than  ; class B . Similarly, class D should be more cohesive than class C . As seen in   marks in Table 2, LCOM3, LCOM4, and LCC produce the identical cohesion values without dierentiating the cohesiveness of those classes. Although Co was proposed to overcome such a deciency of LCOM4, it still did not dierentiate between classes C and D. The above counter-intuitive cases result from the cohesion criteria adopted by the existing measures the existing measures just count the number of instance variables used by methods or the number of method pairs with shared instance variables. We note that such a simple counting approaches with no consideration of the interaction patterns among class members cause such problems mentioned above. In this paper, as a new cohesion criterion we employ the connectivity among the members of a class that is, a class is more cohesive when its members are more strongly connected. This new cohesion criterion not only conforms to the viewpoint that cohesion denotes the degree of the relatedness among the members, but also can overcome the problems mentioned above.

3 New Cohesion Measure In this section, we describe a new cohesion measure for classes. As a preparation for dening the measure, we present a few basic denitions.

3.1 Basic Denitions Denition 3.1 A class C consists of the set of instance variables V (C ) and the set of methods M (C ).

Denition 3.2 For a method Mi , let R(Mi ) be the set of instance variables that are directly referenced by Mi , and let I (Mi ) be the set of methods that are directly invoked by Mi .

8

Then, I  (Mi ) is the set of methods directly or indirectly invoked by Mi I  (Mi ) = fm 2 M (C ) j m 2 I (Mi ) _ 9mi 2 I (Mi )  m 2 I  (mi )g and R (Mi ) is the set of instance variables directly or indirectly referenced by Mi R (Mi ) = fv 2 V (C ) j v 2 R(Mi ) _ 9mi 2 I  (Mi )  v 2 R(mi )g

Denition 3.3 A method in a class is special if it is an access method, a delegation method, a constructor, or a destructor in the class. A method is called normal if it is not special.

Therefore, the M (C ) can be partitioned into the set of special methods Ms (C ) and the set of normal methods Mn(C ). That is, Ms (C ) \ Mn(C ) =  and Ms (C ) Mn(C ) = M (C ).

Denition 3.4 A reference graph for class C, Gr (C ), is a representation of the interac-

tions among the members of class C, and is dened to be an undirected graph G = (N A) with

N = V (C ) M (C )

A = f(m v) j v 2 R (m) where m 2 M (C ) and v 2 V (C )g For example, Figure 2 shows the reference graph of class Stack dened in Figure 1. The reference graph of class Stack represents the interactions between methods Stack(), Stack(), Push(), Pop(), IsEmpty(), and GetTop(), and instance variables items and top. Special methods are depicted as rectangles with rounded edges. top

IsEmpty()

GetTop()

items

Push()

Pop()

Stack()

~Stack()

Figure 2: The reference graph of class Stack A class may be the most cohesive when its members have all the possible interactions among them. However, special methods complete their behaviors by interacting with only some of the instance variables in a class. Therefore, these special methods do not have to interact with all the instance variables. Thus, a class is dened to be most cohesive when its normal methods interact with all of the instance variables.

Denition 3.5 A reference graph, Gr = (N, A), is a Most Cohesive Component(MCC) if each normal method in Mn (Gr ) has interactions with all of instance variables in V (Gr ). A = f(m v) j m 2 Mn (Gr ), v 2 V (Gr )g,

9

where Mn(Gr ) and V (Gr ) denote the set of normal methods and the set of instance variables in Gr , respectively.

For instance, the reference graph for class Stack in Figure 2 is an MCC because each of normal methods Push() and Pop() references both of the instance variables items and top.

3.2 Connectivity of Class Members Our cohesion measure for classes is designed to incorporate our observations on the nature of classes into its denition. We believe that the cohesiveness of a class varies depending not only on the number of their interactions, but also on the patterns of the interactions among its members. More specically, the connectivity among class members inuences the cohesiveness that is, the more strongly the members of a class are connected, the more cohesive it is. For example, consider classes A, B , C , and D again in Table 2. Class A has disjoint interaction pattern that is, its reference graph Gr (A) is already disjoint. In other words, some members of class A have no relationship with the rest of the members of class A. Gr (B ) can be decomposed into two sub-reference graphs if either method M2 or M3 is removed, and Gr (C ) becomes disjoint when both of methods M1 and M2 are removed. In the case of Gr (D), all the methods have to be removed to make Gr (D) disjoint. In a sense, the members of classes A, B , C , and D are connected by zero, one, two, and four method(s) respectively. Actually those methods hold the members of a class together. Therefore, the members of class D are more strongly connected than those of class C , the members of class C than those of class B , and similarly, the members of class B than those of class A. Thus, it can be claimed that class D is more cohesive than class C , class C more cohesive than class B , and class B more cohesive than class A. A class has a subset of methods without which its reference graph becomes disjoint. Let's call the minimum set of methods without which a reference graph, Gr , can be divided into sub-reference graphs, the glue methods of Gr , Mg (Gr ). In order to represent how strongly the members of a class are connected by the glue methods, we introduce the notion of the connectivity factor.

Denition 3.6 The connectivity factor of a reference graph Gr , Fc(Gr ), represents the

degree of the connectivity among the members, and is dened to be the ratio of the number of the glue methods jMg (Gr )j to the number of the normal methods jMn(Gr )j.

Mg (Gr )j Fc (Gr ) = jjM (G )j n

10

r

(1)

For example, the connectivity factors of classes A, B , C , and D in Table 2 are 40 , 14 , 24 , and 44 respectively each of them has four normal methods and the numbers of glue methods are zero, one(M2 or M3), two(M1 and M2), and four(M1 , M2 , M3 , and M4 ) respectively. The characteristic of special methods is incorporated into the denition of the connectivity factor the special methods have no impact on the connectivity factor. For example, the connectivity factor of class Stack in Figure 2 is one(= 22 ) where special methods(Stack(), Stack(), IsEmpty(), and GetTop()) do not aect the connectivity factor. The connectivity factor is dened to range from zero to one. The connectivity factor is the minimum value, zero, for a class with a disjoint interaction pattern, and the maximum value, one, for a class with the maximum interaction pattern. The connectivity factor of a disjoint class(e.g., class A in Table 2) is zero because there is no glue method no method is needed to separate the class. The connectivity factor of an MCC(e.g., class D in Table 2 and class Stack in Figure 2) is obviously one because all of the normal methods are also glue methods. When a reference graph consists of either only instance variables or only methods, the reference graph with a single member is dened to have the maximum connectivity factor. Otherwise the connectivity factor is zero because there exist no other members that can bind them together. In addition, the connectivity factor of a class reects the cohesiveness of the class the high connectivity factor of a class means that the members of the class are strongly related with each other. Therefore, the cohesion of a class can be dened in terms of its connectivity factor.

3.3 Hierarchical Structure of Interaction Patterns A class can be decomposed into several components by removing the glue methods. We use the term component to indicate the sub-reference graph obtained by decomposing a reference graph by removing the glue methods. Each component of a class contributes to the cohesiveness of a class dierently. The components that are not MCC are apt to reduce the cohesiveness of a class to a certain degree because the class contains the components of weak cohesion. Therefore, it is desired to consider the interaction patterns of the components of a class in evaluating the overall cohesiveness of the class. We construct the structure tree for a class to analyze the hierarchical structure of interaction patterns of a class. The structure tree of a class describes how the class is decomposed into its constituent components as the glue methods are removed.

Denition 3.7 The structure tree for a reference graph Gr , Ts(Gr ), is a tree T = (N, A) with the root node Gr , where

11

N = fGri j Gri Gr g

A = f( Grp  Grc ) j Grc is one of the connected sub-reference graphs obtained from Grp by removing Mg (Grp )g and all the leaves are MCCs.

The structure tree for a reference graph can be constructed with the components that are obtained by removing its glue methods and the associated interactions from the reference graph. This decomposition procedure is applied recursively to the components until each of them becomes an MCC. After all, the structure tree of a class describes how the class can be decomposed into a collection of MCCs. The decomposition of a reference graph is not always unique because a reference graph can be separated by several dierent sets of glue methods. In that case, the set of glue methods that results in a higher cohesion for the children is selected. For example, Figure 3 depicts the structure tree for Gr (E ) of class E that is at the root of the structure tree. The reference graph of class E can be divided into two components, Gr1 and Gr2 , by removing the glue method M3 along with its interactions. Gr2 is not decomposed further because it is an MCC. However, Gr1 can be partitioned into three components, Gr11 , Gr12 , and Gr13 by eliminating the glue method M2. After the decomposition, each of Gr11 , Gr12 and Gr13 become an MCC. Thus, the construction of the structure tree is nished. We believe that the cohesion of a class is aected by not only the connectivity factor but also the cohesions of its components in the structure tree. The components of weak cohesion would reduce the cohesion of a class because the class consists of less cohesive components. For example, the cohesion of class E will be reduced from the connectivity factor 51 , because class E consists of two components one of which Gr1 , not an MCC, has weak cohesion. Therefore, in order to properly evaluate the overall cohesiveness of a class, the cohesiveness of the components as well as the connectivity factor should be considered. To indicate the cohesiveness of the components, we introduce the structure factor for a reference graph. The structure factor of a reference graph is taken as the average cohesion of the components in order to make the structure factor normalized between zero and one, and take into account of the cohesiveness of all the components. Formally, the structure factor can be dened as follows:

Denition 3.8 The structure factor for a reference graph Gr , Fs(Gr ), is dened to be the average cohesion of its children in the structure tree.

Fs (Gr ) = n1

Xn CBMC(Gri) i=1

12

(2)

G r(E)

V1

V2

M1

M2

V3

V4

M3

V5

M4

V6

M5

G r2

G r1

V1

V2

M1

G r11

V1

V3

V4

V5

M2

M4

V6

M5

G r13

G r12

V2 V3

V4

M1

Figure 3: The structure tree for class E where Gri is one of the n children of Gr in the structure tree, and CBMC (Gri ) denotes the cohesion of a component Gri .

The structure factor of a reference graph is dened in terms of our cohesion measure named CBMC(Cohesion Based on Member Connectivity) of its children in the structure tree. CBMC will be dened to range from zero to one in the successive denition(see Denition 3.9). Thus, the structure factor ranges from zero3 to one. The maximum structure factor indicates that every component has the maximum cohesion. On the contrary, the existence of the components of weak cohesion will lead to a structure factor less than one. Using the connectivity factor and the structure factor, our cohesion measure CBMC is dened as follows:

Denition 3.9 The CBMC for a class C , CBMC (C ), is dened to be the connectivity factor of its reference graph, Fc (Gr (C )), scaled by the structure factor of its reference graph,

Actually, the structure factor can not be zero because each of the children has cohesion greater than zero. 3

13

Fs (Gr (C )). CBMC (C ) = Fc (Gr (C )) Fs (Gr (C ))

X

n = Fc (Gr (C )) n1 CBMC (Gri (C ))

(3)

i=1

The connectivity factor of a class indicates the strength of the relationship between the class members, and the structure factor denotes the degree of the contribution of the components to the overall cohesiveness of the class. The maximum structure factor of a class means that all the components of the class have the maximum cohesion, and therefore the cohesion of the class need not be reduced from its connectivity factor. On the other hand, the structure factor less than one will reduce the cohesion of the class because it indicates that the class has components of weak cohesion. In addition, the degree of the reduction depends on how less cohesive components a class has that is, the lower the value for the structure factor, the less cohesive component of the class, thus the more the cohesion value of the class is reduced. For example, CBMC of class E in Figure 3 can be computed as in Figure 4. CBMC of class E is 15 34 , where 15 is the connectivity factor of Gr (E ), and 43 is the structure factor of Gr (E ). The cohesion is reduced from its connectivity factor 51 by the structure factor 34 because of the component Gr1 of weak cohesion. CBMCs of classes A, B 4 , C , and D in Table 2 are calculated as 40 56 , 41 34 , 24 43 , and 4 respectively. In summary, to reect the characteristics of classes, CBMC was proposed to 4 have the following properties:

Special methods have no inuence on cohesiveness. We dene CBMC to be indepen-

dent of special methods by ignoring them in the denition of MCC(see Denition 3.5) and the connectivity factor(see Equation 1).

The cohesion of a class depends on the connectivity of the class members that is, the

relations among class members are captured by connectivity, not by the existing criteria(i.e., sharing of instance variables or instance variable usage). Therefore, CBMC is dened to be proportional to the connectivity factor.

A class with less cohesive components is less cohesive. The structure factor indi-

cates the cohesiveness of the components of the class a component of weak cohesion produces a small structure factor. Since CBMC is dened to be reduced from the

4 Because both of methods M and M result in the same structure factor, Either of them can be selected 2 3 as glue method.

14

CBMC (E ) = Fc (Gr (E ))  Fs (Gr (E )) 



=

Fc (Gr (E ))  12 CBMC (Gr1 (E )) + CBMC (Gr2 (E ))

=

Fc (Gr (E ))  12 Fc (Gr1 (E ))







 13 CBMC (Gr11 (E )) + CBMC (Gr12 (E )) + CBMC (Gr13 (E )) 

+

Fc (Gr2 (E ))

=

Fc (Gr (E ))  21 Fc (Gr1 (E ))  13 Fc (Gr11 (E )) + Fc (Gr12 (E )) + Fc (Gr13 (E ))

+

Fc (Gr2 (E ))









= =

1 5 1 5

2 1



1 2

3 1





1+1+1

 +1

 34 Figure 4: The computation of CBMC for class E

connectivity factor by the scale of the structure factor, the cohesion of a class with less cohesive components suers a more reduction than that of a class with more cohesive components.

3.4 Application of Cohesion Measure The cohesion measure, CBMC , can indicate the overall cohesiveness of a class. In addition, the connectivity and the structure factors can be used to capture some aspect of the design of classes we can understand the design quality by the combination of the connectivity factor Fc and the structure factor Fs .

Case I: High Fc and High Fs . In this case, the members of a class are tightly bound by a number of glue methods, and when the class is split into several components, each component also shows high cohesion. An MCC, which shows the ideal cohesion, belongs to this case. A class in this case can be considered to be well designed that is, the class seems to properly capture the features of the corresponding objects. Therefore, no special action is needed to improve the quality. 15

Case II: Low Fc and High Fs . A class can be easily partitioned into several components of high cohesion by removing a few glue methods. The highly cohesive component may probably capture a class that was missed in the design phase. The original class can be restructured by creating a new class for each component and having as an instance variable the instance of the newly created class instead of the component.

Case III: High Fc and Low Fs . The members of a class are tightly related by a number of glue methods, while the partitioned components can be easily split by removing a few glue methods. This case also indicates the poorly designed classes, but it seems that there is little possibility of such kind of classes in practice.

Case IV: Low Fc and Low Fs . In this case, a class is just a collection of members which have little relationship among them. Therefore, the class seems to be unnecessary and, thus, the members of the class should be delegated into other relevant classes.

4 HYSS: A Cohesion Measurement Tool We have developed a cohesion measurement tool, named HYSS, to automate the computation of the major existing cohesion measures including CBMC for C++ programs. HYSS was implemented in C++ under UNIX, and Motif was used for the graphical user interface. The current version of HYSS is available at our web site http://salmosa.kaist.ac.kr/hschae/HYSS/. Figure 5 shows snapshots of HYSS. Figure 5 (a) shows two windows when HYSS is executed on InterViews class browser window for browsing classes and their members in source programs, and cohesion measurement window for displaying the computed cohesion values for all the major measures. Figure 5 (b) shows the detailed result of CBMC for class E in Figure 3. HYSS accepts a C++ source program and extracts information on classes such as instance variables, methods, and interactions among them by using GEN++17]. A reference graph is constructed for each class with the information extracted from C++ source program. Figure 6 describes an algorithm for computing CBMC for a reference graph Gr . When Gr is an MCC, CBMC is one and when Gr is disjoint, CBMC is zero. Otherwise, glue methods are identied by attempting to decompose Gr . Once glue methods are determined, the connectivity factor is calculated as the relative number of the glue methods. 16

(a) Windows for class browsing and cohesion measurement

(b) Detailed result of CBMC for class E

Figure 5: Sample snapshots of HYSS operation

17

function CBMC(Gr ) switch Gr of case Gr is MCC : return 1 case Gr is disjoint : return 0 otherwise : // Gr is connected, for k = 1 to jM (G ;jrM)j(G;r )1j for m = 1 to k

but not an MCC

k methods from Gr ) Mg (Gr ) Gr can bePdecomposed into G1r , G2r , : : : , Gnr by removing Mg (Gr ) Fsm = n1 1in CBMC(Gir )

select a different set of

if

else

F sm = 0 endif endfor if Gr was decomposed by removing k methods then Fc = jMnk(Gr )j Fs = Max1m(jM (kGr )j) Fsm // choose the maximum return Fc  Fs endif endfor endswitch end

structure factor

Figure 6: Algorithm for the computation of CBMC

18

then

The structure factor is computed to be the average CBMC of the decomposed components by repeatedly applying this algorithm to each component. When there is more than one set of glue methods, we choose glue methods that lead to the greatest cohesion that is, the greatest structure factor.

4.1 Identication of Special Methods Our cohesion measure is unique in its treatment of special methods, and hence correct identication of them is crucial. Constructors and destructors can be identied easily. We only have to look at the syntax of the C++ classes that is, constructors and destructors of class C are designated by the names of C () and C () respectively in C++ language. However, access methods and delegation methods are rather dicult to identify. We have to perform a close examination of a method with respect to the usage of instance variables and the kind of the statements in the method. Access methods either update or retrieve the value of an instance variable. So, usually an access method has the body in either of the following forms:

Assignment statement: an-instance-variable = value

Return statement: return an-instance-variable In HYSS, a method is regarded as an access method if the method has only one return statement or assignment statement referencing only one instance variable. A delegation method in a class just delegates its behavior to an instance variable in the class. Therefore, the body of a delegation method has the following form:

Method invocation on an instance variable: an-instance-variable . j -> ]some-method() In HYSS, a method is regarded as a delegation one if the method has only one method invocation to an instance variable. We think that the above heuristics are appropriate to identify access methods and delegation methods. Figure 7 shows the procedure for examining whether a given method Mi in class C is special or normal.

4.2 Considerations for C++ Classes The following adaptations have been made to apply our cohesion measure to C++ classes.

Non-public methods.

C++ allows some methods to be hidden in a class by specifying \private" or \protected" so that they can be invoked only by other methods within that class or its 19

function KindOfMethod(Method Mi , Class C ) if ( name of Mi = name of C ) then return Constructor if ( name of Mi =  name of C ) then return Destructor RefCount = the number of instance variables referenced by method StmtCount = the number of statements in method

Mi

StmtType = the kind of the single statement in method

if

( StmtCount = 1 )

if if

and

( RefCount = 1 )

( StmtType = Assignment Statement

return

or

then

Mi

Return Statement )

Access Method

( StmtType = Method invocation on an instance variable )

return

endif return end

Mi

then then

Delegation Method

Normal Method

Figure 7: Determination of special methods subclasses. Such non-public methods do not provide public services to the instantiating clients. The non-public methods in a class do not capture directly the behaviors of objects, but are often introduced in order to make the implementation of other public methods more convenient. Actually, they are more likely to interact with only some of the instance variables in the class, and the implementation of classes is possible even without those non-public methods. Inclusion or exclusion of non-public methods depends on the purpose of a cohesion measure. If the measure is used to determine the structural properties characterized by all the members in a class, the non-public methods also should be considered. However, CBMC focuses on the viewpoint that a class represents some objects with its members by capturing the state and behavior of the objects. Therefore, the public methods that capture the behavior of the objects are only considered and the nonpublic methods are indirectly considered by connecting public methods and instance variables. Bieman et al.3, 29] agree to our idea that class cohesion should be dened in terms of public methods. In LCC and TCC, they considered the connection between the public methods by excluding the non-public methods.

Inherited members. As Briand et al.6] mentioned, there are two options concerning the members inherited from superclasses, each with a dierent intention: 20

{ Exclude inherited members from the analysis in order to analyze to what degree

the extension by the class represents a single semantic concept. { Include inherited members in the analysis in order to analyze whether the class as a whole still represents a single semantic concept. Since our cohesion measure concentrates on the problem that a class captures the features of objects and the features of objects are realized by the members including the inherited ones, we choose a variation of the inclusion strategy. In the case of instance variables, all the inherited ones are included in the analysis because each of them captures the state information of objects. However, in the case of methods, we include only the inherited methods that can be publicly invoked. The inherited, but non-public methods are not included because they do not capture the behavior of objects instantiated from the subclass. In C++ terminology, the public methods inherited from a superclass are included only if a subclass is derived using \public" specier. The strategy to the treatment of non-public methods and inherited members is based on the consistent idea that the members actually capturing the features of the corresponding objects are directly considered for the cohesion measurement.

5 Empirical Study In order to demonstrate the eectiveness of our new measure CBMC, we conducted a case study with the InterViews system that is a C++ toolkit for X Windows developed at Stanford University and later Silicon Graphics. InterViews provides a set of classes that dene the behavior of graphical user interface objects such as windows, buttons, menus, and documents. Figure 8 (a) and (b) show the distributions of instance variables and methods in 134 classes in InterViews employed for the study. As seen from Figure 8 (a), most classes have a small number of instance variables about 72% of classes have no more than ve instance variables. The solid bars in Figure 8 (b) shows the distribution of the number of all the methods in InterViews classes. About 48% of the classes have no more than ten methods and some classes have more than fty methods. The distribution of the normal methods, when special methods are excluded, is shown as the hollow bars in Figure 8 (b). The exclusion of the special methods leads to the increase in the ratio of classes with less number of methods about 52% of classes have no more than ten methods and no class has more than fty methods. 21

(a) Instance variables

(b) Methods

Figure 8: Characteristics of InterViews classes Figure 9 describes the kinds of methods in InterViews classes when we apply the algorithm in Figure 7. About 43% of methods are found to be special. This data shows the common use of special methods in a practical system and, thus, indicates the importance of our strategy on special methods in order to properly capture the cohesiveness of classes. By examining the source code of InterViews classes, we also found that the access methods and delegation methods identied by the proposed heuristics were actually access ones and delegation ones.

Figure 9: Kinds of methods in InterViews The identication of glue methods is rather complicated that is, theoretically at the worst case all the combinations of the normal methods in a class should be examined. So, it may be dicult to apply our CBMC to large classes. However, through the experiment on InterViews, we found that the size of glue methods were actually rather small. There are classes with many normal methods in InterViews and some of them have even 48 normal methods(see the hollow bars in Figure 8 (b)). However, our experiment shows that the 22

maximum number of glue methods is only six. In addition, 70% of classes with connected interaction pattern have less than four glue methods. Figure 10 shows the distribution of the size of glue methods in InterViews.

Figure 10: Glue methods in InterViews

5.1 Measurement Results The distribution of the connectivity factors, the structure factors, and the CBMC of classes in InterViews is shown in Figure 11. Each gure shows how the values of those factors are distributed among the classes in InterViews.

(a) Connectivity factor

(b) Structure factor

(c) CBMC

Figure 11: Connectivity factor, structure factor, and CBMC The study reveals that many classes in InterViews have weak cohesion(See Figure 11 (c)), which is due to the low connectivity factor(See Figure 11 (a)). Figure 12 (a) shows the classication of classes according to the value of the connectivity factor Fc and the structure factor Fs described in Section 3.4. No class belongs to the Cases III(high Fc and low Fs ) and IV(low Fc and low Fs ) but many of classes belong to Case II(low Fc and high Fs ). Figure 12 (b) shows the distribution of the classes of Case II on four dierent types, each of which is examined in detail: 23

(a) Categories of classes

(b) Detailed examination on Case II

Figure 12: The CBMCs of classes in InterViews

No normal methods: The normal methods in a class perform rather complex computa-

tion, possibly using many instance variables. In essence, the object behavior captured by the class is reected in its normal methods. When a class does not have any normal methods, it may be an artifact of the code environment rather than it is capturing some real object behavior. With no normal methods, there can be no public behavior hence there is no use for such classes. InterViews has some classes exhibiting this behavior. Eight classes have no methods at all and a further ve classes have no normal methods. The total thirteen classes exhibiting this behavior represent about 10% of the classes for this system.

No instance variables: The instance variables in a class represent the state information

of the instantiated objects and methods show dierent behaviors depending on the values of the instance variables. In other words, the instance variables in a class serve to glue the methods together, and the methods show their collaborating behaviors by sharing the instance variables. So, a class with no instance variables seems to be a collection of some unrelated methods, not a representation of some objects. For example, class osMath has no instance variable even with 25 methods. Class osMath is a collection of various arithmetic functions such as addition and subtraction for numeric data types int, oat, double. The major object-oriented design methods recommend that a class for each specic numeric data type should be created and arithmetic functions should be dened within that class.

Isolated members: Many classes have isolated methods and/or instance variables. A 24

method in a class is isolated if the method has no interaction with instance variables in the class. Similarly, an instance variable in a class is isolated if the instance variable is referenced by no methods within the class. Many classes in InterViews are found to have one or more isolated methods. It is not reasonable to encapsulate into the class the isolated method that exhibits a behavior unrelated with the remaining members of the class. Thus, the existence of the isolated method is undesirable. An isolated method often performs the behavior only with the parameters. For example, method parse value() in class StyleAttribute does not interact with any other members in the class, but performs some operations only on the parameter object of class String. So, it would be a better design practice to dene method parse value() in class String, not in class StyleAttribute. As another example, class ivColor denes method nd() to get an RGB value for a color name. Method nd() merely interacts with the parameter object of class ivDisplay, so it is more desirable to dene method nd() in class ivDisplay, not in class ivColor. Class ivGlyph has some isolated methods pick(), draw(), and print() to perform pickcorrelation, drawing, and printing respectively. All of them require the coordinate information on the glyph and the coordinate information is supplied as the parameter object of class ivAllocation to those methods. Those methods perform their behaviors only with the parameters. Thus, it is more desirable for class ivGlyph to include an instance variable of class ivAllocation because the coordinate on each glyph is essential state information for the glyph objects. Class ivGlyph has some methods that do not even perform any action that is, they have only one null statement. Methods append(), prepend(), insert(), remove(), replace(), change(), count(), and component() are examples of such methods. These methods are introduced to provide child management operations, so they make sense only for composite glyphs classes such as classes ivDeck, ivAggregate, and ivBox. Thus, they are not meaningful for primitive glyphs classes such as classes ivCharacter, ivSpace, and ivLabel. Such empty methods are used to dene a default behavior of child management operations for primitive glyphs classes. So, they violate the principle of class hierarchy design that says a class should only dene operations that are meaningful to its subclasses. Actually class ivGlyph and its subclasses implement composite pattern of design pattern, and problems with respect to such implementation of composite pattern have been discussed19]. An isolated instance variable may imply that the instance variable is unnecessary for 25

abstracting the state information of the objects. For example, classes StyleToColor, StyleToFancyColors, and StyleToFont have an isolated instance variable last . Investigation into them reveals that the instance variable last is used only in a constructor and can be derived from other instance variables first and size . Thus, the instance variable last seems not to be an essential variable to capture the state information of the objects. It is more desirable to declare last as a local variable in the constructor.

No interaction with inherited members: Many subclasses are found to have no in-

teraction with the inherited members from superclasses. Class ivResource and its subclasses show such an interaction pattern. Most of InterViews classes are derived from class ivResource, but class ivResource and its subclasses represent very distinct aspect. While class ivResource is introduced on purpose to manage shared objects within a program, its subclasses capture objects relevant to the graphical user interface such as windows, buttons, menus, and documents. Such a behavioral dierence between class ivResource and its subclasses leads to the disjoint interaction pattern of its subclasses one component for class ivResource and the other component for its specic GUI element. This is partly because class ivResource is used for implementation inheritance. Implementation inheritance is only for reusing the implementation of superclass without guaranteeing the generalization relationship between subclasses and superclasses. Contrary to the inheritance based on behavioral compatibility, the implementation inheritance can lead to programs that are hard to maintain and extend. Thus, it is common that the use of implementation inheritance is discouraged, and delegation is suggested as an alternative mechanism to achieve the desired code reuse30].

5.2 Principal Component Analysis In order to cope with the problems in the existing cohesion measures, we proposed a new cohesion measure CBMC based on the new notion of special methods and the new cohesion criterion of member connectivity. We performed Principal Component Analysis(PCA)18] on InterViews to demonstrate that CBMC is dierent from the existing cohesion measures.

5.2.1 Analysis Procedure Briand et al.8] proposed a procedure for data analysis in order to make an experiment more repeatable and comparable. We performed PCA according to the proposed procedure as follows: 26

1. Collect data for the cohesion measures. The developed cohesion measurement tool HYSS can compute the existing cohesion measures mentioned in Section 2: LCOM1-5, Co, Coh, LCC, and TCC. Using HYSS, cohesion values for each measure are computed from the InterViews classes. 2. Identify outliers. Outliers are data points that are located in an empty part of the sample space. Inclusion or exclusion of outliers can have a large inuence on the analysis results. Thus, outliers should be identied and removed. In this paper, we calculate for each data point the Mahalanobis Distance from the sample space centroid. Outliers are data points with a large distance from the centroid and are excluded from the analysis. The T2 max procedure based on the Mahalanobis distance is used to detect outliers. See 24] for more details on outlier analysis. 3. Perform principal component analysis. If a group of variables in a data set are strongly correlated, these variables are likely to measure the same underlying dimension(i.e. class property) of the object to be measured. PCA is a standard technique to identify relations between the variables in the data set. Principal Components(PCs) are a linear combination of the standardized independent variables. PCs are calculated as follows: The rst PC is the linear combination of all standardized variables that explain a maximum amount of variance in the data set. The second and subsequent PCs are linear combinations of all standardized variables, where each new PC is orthogonal to all previously calculated PCs and captures a maximum variance under these conditions. Usually, only a subset of all variables in a PC have large coecients - also called the loading of the variable - and therefore contribute signicantly to its variance. The variables with high loadings help identify the dimension the PC is capturing. In order to produce a clearer pattern of loadings, we used varimax rotation, the most frequently used technique.

5.2.2 Measurement Results We measured cohesion values for 134 classes in InterViews by using HYSS and performed outlier analysis for 134 classes and identied 7 outliers. Thus, 127 classes were used for PCA. Table 3 provides the descriptive statistics for the cohesion measures for 127 classes and Figure 13 shows the distributions of the existing cohesion measures on InterViews classes. 27

(a) LCOM1

(b) LCOM2

(c) LCOM3

(d) LCOM4

(e) Co

(f) LCOM5

(g) Coh

(h) LCC

(i) TCC

Figure 13: Distribution of the cohesion measures

28

Table 3: Measure Max. LCOM1 819 LCOM2 777 LCOM3 30 LCOM4 25 Co 1 LCOM5 0.94 Coh 1 LCC 1 TCC 1 CBMC 1

Descriptive statistics for cohesion measures 75% Median 25% Min. Mean Std. Dev. 12 2 0 0 30.93 113.54 0 0 0 0 19.73 92.71 2 1 1 1 2.83 4.41 2 1 1 1 1.88 2.56 1 1 0.91 0.33 0.90 0.16 0.55 0.42 0.32 0 0.41 0.24 0.67 0.62 0.02 0 0.48 0.34 1 1 1 0 0.88 0.29 1 1 0.7 0 0.83 0.30 0.5 0 0 0 0.25 0.39

5.2.3 PCA Result Through PCA, we identied ve PCs that describe 94.6% of the variance in the data set. The loadings of each PC is given in Table 4, where major coecients for each PC are marked with . For each PC, we also provide its eigenvalue, the variance of the data set explained by the PC, and the cumulative variance. Based on the loadings of the cohesion measures, the PCs are interpreted as follows:

PC (48.2%): LCC, TCC, and Co depend on the ratio of method pairs with shared 1

instance variables, and consider indirect sharing of instance variables by method invocation. These measures are normalized that is, they have upper and lower bounds.

PC (27.5%): LCOM1, LCOM2, and LCOM3 count the number of method pairs with 2

shared instance variables. Also, these measures are not normalized that is, they have no upper bounds.

PC (7.9%): LCOM5 and Coh depend on the instance variable usage that is, they 3

count the number of the interactions between instance variables and methods. And they are normalized.

PC (5.9%): CBMC depends on the degree of connectivity among class members, and 4

takes into account methods that can actually impact on the cohesiveness of a class by excluding special methods.

PC (5.1%): LCOM4 counts the sharedness of instance variables including indirect 5

sharing of instance variables by method invocation. 29

The result of PCA shows that CBMC denes a dimension of its own: that is, CBMC is only major factor in PC4 . This result indicates that our cohesion measure captures a dierent aspect of properties of classes. We believe that this uniqueness results from the fact that CBMC is based on dierent ideas from the existing measures CBMC depends on connectivity among class members and considers the characteristic of special methods that have no inuence on cohesiveness. Except the case of LCOM4 in PC5 , our experiment shows the result similar to that of PCA previously performed by Briand et al.7]. In their study, LCOM4 belonged to the same PC as LCOM3, which indicated that method invocation did not signicantly aect the distribution of LCOM4 in the data set they used. However, in the case of InterViews classes indirect sharing of instance variables by method invocation seems to lead to the dierence between LCOM3 and LCOM4.

Eigenvalue Proportion Cumulative LCOM1 LCOM2 LCOM3 LCOM4 Co LCOM5 Coh LCC TCC CBMC

Table 4: Rotated Components PC1 PC2 PC3 PC4 4.82 2.75 0.80 0.59 48.2 % 27.5 % 7.9 % 5.9 % 48.2 % 75.7 % 83.6 % 89.5 % 0.071 0:850 ;0.009 ;0.013 ;0.001 0:981 ;0.002 ;0.070 ;0.054 0:894 ;0.166 ;0.047 ;0.257 0.399 ;0.196 ;0.099 0:757 ;0.092 0.313 0.134 ;0.502 0.002 ;0:819 ;0.134 0.396 ;0.144 0:878 0.144 0:925 0.022 0.299 0.141 0:935 0.025 0.303 0.114 0.191 ;0.079 0.155 0:963

PC5 0.52 5.1 % 94.6 % 0.028 0.116 0.345

0:851 ;0.216 0.171 ;0.107 ;0.118 ;0.097 ;0.073

6 Conclusion and Future Works As a basic unit of object-oriented systems, classes have some characteristics that were not found in conventional systems and such characteristics of classes should be incorporated into the denition of cohesion measures. Otherwise, cohesion measures could fail to properly reect the cohesiveness of a class. Some methods in a class, so called special methods, inherently interact with only some 30

of instance variables and, thus, should not reduce the cohesiveness of classes. In addition, the existing cohesion criteria could lead to dierent result from our expectation. We noted the problems in the existing cohesion measures due to the lack of consideration of such characteristics of classes and proposed a new cohesion measure CBMC to cope with the problems found in the existing cohesion measures and correctly reect the cohesiveness of classes. CBMC considers the nature of special methods and adopts the connectivity among class members as a new cohesion criterion. In addition, the connectivity and the structure factors are useful to understand the design quality of classes. We developed HYSS, a cohesion measurement tool for C++ programs, to automate the computation of the various cohesion measures including CBMC. Using HYSS, we performed a case study with InterViews system in order to demonstrate the eectiveness of CBMC. We discovered many classes of weak cohesion in InterViews, and described in detail what deciencies of those classes led to such a low cohesion. We also performed principal component analysis with InterViews to identify the relationships between the existing cohesion measures and CBMC. The result of PCA showed that CBMC captured a new aspect of properties of classes that was not captured in the existing cohesion measures. Our research can be extended in several directions. The notion of special methods is unique to our CBMC. Thus, the correct identication of them is very important. In the case study, we provided a reasonable heuristics for identifying access methods and delegation methods. The more proper heuristics can be developed with the use of more precise information on methods such as data dependency between instance variables. We are also trying to discover other kind of methods that can be classied into special methods. Any method can be classied into special method if it can satisfy the intended behavior by interacting with only some of the instance variables in a class. The computation of CBMC is rather complicated because of the complexity of glue methods identication and the recursive denition of CBMC. The current version of HYSS examines all the possible combination of normal methods for identifying glue methods. For the application of CBMC on larger classes, it is necessary to devise an ecient algorithm for identifying glue methods Currently, CBMC focuses on the degree of how properly a class captures the state and behavior of the corresponding objects. However, classes of low CBMC can have indirect impact on external quality of classes such as maintainability and reusability. For example, a class of high CBMC is likely to be easy to maintain and reuse. The empirical study on the relationship between CBMC and those external quality factors remains further work.

31

References 1] V. R. Basili, L. C. Briand, and W. L. Melo, `A Validation of Object-Oriented Design Metrics as Quality Indicators', IEEE Transactions on Software Engineering, 22(10), 751-761 (1996). 2] J. M. Bieman and L. M. Ott, `Eects of Software Changes on Module Cohesion', Proceedings of IEEE/ACM Conference on Software Maintenance, 1992, pp. 345-353. 3] J. M. Bieman and B. -K. Kang, `Cohesion and Reuse in an Object-Oriented System', Proceedings of ACM Symposium on Software Reusability, 1995, pp. 259-262. 4] G. Booch, Object-Oriented Analysis and Design with Applications, Benjamin/Cummings Publishing Company, Inc. 1991. 5] L. C. Briand, S. Morasca, and V. R. Basili, `Dening and Validating High-Level Design Metrics', Technical Report CS-TR-3301-1, University of Maryland, 1994. 6] L. C. Briand, J. W. Daly, and J. W"ust, `A Unied Framework for Cohesion Measurement in Object-Oriented Systems', Empirical Software Engineering Journal, 3(1), 65-117 (1998). 7] L. C. Briand, J. W. Daly, and J. W"ust, `A Comprehensive Empirical Validation of Design Measures for Object-Oriented Systems', Proceedings of 5th International Symposium on Software Metrics, 1998, pp. 246-257. 8] L. C. Briand, J. W"ust, J. W. Daly, and D. V. Porter, `Exploring the Relationships between Design Measures and Software Quality in Object-Oriented Systems', Journal of Systems and Software (to be published). 9] N. N. Card, G. T. Page, and F. E. McGarry, `Criteria for Software Modularization', Proceedings of 8th International Conference on Software Engineering, 1985, pp. 372377. 10] N. N. Card, V. E. Church, and W. W. Agresti, `An Empirical Study of Software Design Practices', IEEE Transactions on Software Engineering, 12(2), 264-271 (1986). 11] H. -S. Chae and Y. -R. Kwon, `Assessing and Restructuring of Classes Based on Cohesion', Proceedings of Asia-Pacic Software Engineering Conference, 1996, pp. 76-82. 12] H. -S. Chae and Y. -R. Kwon, `A Cohesion Measure for Classes in Object-Oriented Systems', Proceedings of 5th International Symposium on Software Metrics, 1998, pp. 158-166. 13] S. R. Chidamber and C. F. Kemerer, `Towards a Metrics Suite for Object Oriented Design', Proceedings of 6th ACM Conference on Object-Oriented Systems, Languages, and Applications, 1991, pp. 197-211. 32

14] S. R. Chidamber and C. F. Kemerer, `A Metrics Suite for Object Oriented Design', IEEE Transactions on Software Engineering, 20(6), 476-493 (1994). 15] B. J. Cox, `There is a silver bullet', Byte, 15(10), 209-218 (1990). 16] B. J. Cox, `Planning the software industrial revolution', IEEE Transactions on Software Engineering, 7(6), 25-33 (1990). 17] P. Devanbu, `GENOA a customizable, language- and front-end independent code analyzer', Proceedings of 15th International Conference on Software Engineering, 1992, pp. 307-317. 18] G. Dunteman, Principal Component Analysis, SAGE Publications, 1989. 19] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software, Addison-Wesley, 1994. 20] B. S. Gupta, `A Critique of Cohesion Measures in the Object-Oriented Program', Technical Report, Michigan Technological University, 1997. 21] B. Henderson-Sellers, Software Metrics, Prentice Hall, 1996. 22] M. Hitz and B. Montazeri, `Measuring Coupling and Cohesion in Object-Oriented Systems', Proceedings of International Symposium on Applied Corporate Computing, 1995. 23] M. Hitz and B. Montazeri, `Chidamber and Kemerer's Metrics Suite: A Measurement Theory Perspective', IEEE Transactions on Software Engineering, 22(4), 267-271 (1996). 24] J. D. Jobson, Applied Multivariate Data Analysis Volume II: Categorical and Multivariate Methods, Springer-Verlag, 1992. 25] R. E. Johnson and B. Foote, `Designing Reusable Classes', Journal of Object-Oriented Programming, 1(2), 22-35 (1988). 26] W. Li and S. Henry, `Object-Oriented Metrics that Predict Maintainability', Journal of Systems and Software, 23(2), 111-122 (1993). 27] W. Li and S. Henry, `Maintenance Metrics for the Object-Oriented Paradigm', Proceedings of 1st International Symposium on Software Metrics, 1993, pp. 52-60. 28] K. J. Lieberherr and I. M. Holland, `Assuring Good Style for Object-Oriented Programs', IEEE Software, 6(9), 38-48 (1989). 29] L. M. Ott, J. M. Bieman, B. -K. Kang, and B. Mehra, `Developing Measures of Class Cohesion for Object-Oriented Software', Proceedings of 7th Annual Oregon Workshop on Software Metrics, 1995. 30] J. Rumbaugh, et al., Object-Oriented Modeling and Design, Prentice Hall, 1991. 33

31] A. Snyder, `Encapsulation and Inheritance in Object-Oriented Programming Languages', Proceedings of 1th ACM Conference on Object-Oriented Systems, Languages, and Applications, 1986, pp. 84-91. 32] W. Stevens, G. Myers, and L. Constantine, `Structured Design', IBM Systems Journal, 13(2), 115-139 (1974). 33] N. Wilde and R. Huitt, `Maintenance Support for Object-Oriented Programs', IEEE Transactions on Software Engineering, 18(12), 1038-1044 (1992).

34