A Service-Oriented Componentization Framework for Java ... - STAR

A Service-Oriented Componentization Framework for Java Software Systems

by

Shimin Li

A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree of Master of Applied Science in Electrical and Computer Engineering

Waterloo, Ontario, Canada, 2006

c °Shimin Li 2006

I hereby declare that I am the sole author of this thesis. I authorize the University of Waterloo to lend this thesis to other institutions or individuals for the purpose of scholarly research.

Shimin Li

I further authorize the University of Waterloo to reproduce this thesis by photocopying or by other means, in total or in part, at the request of other institutions or individuals for the purpose of scholarly research.

Shimin Li

ii

Abstract Service-oriented computing has dramatically changed the way in which we develop software systems. In the fast growing global market for services, providing competitive services to these markets is critical for the success of businesses and organizations. Since many competitive services have already been implemented in existing systems, leveraging the value of an existing system by exposing all or parts of it as services within a service-oriented environment has become a major concern in today’s industry. In this work, we categorize services embedded in a system into two categories : i) Top-level services that are not used by another service but may contain a hierarchy of low-level services further describing and modularizing the service, and ii) Low-level services that are underneath a top-level service and may be agglomerated with other low-level services to yield a new service with a higher level of granularity. To meet the demand of identifying and reusing the business services embedded in an existing software system, we present a novel service-oriented componentization framework that automatically supports: i) identifying critical business services embedded in an existing Java system by utilizing graph representations of the system models, ii) realizing each identified service as a self-contained component that can be deployed as a single unit, and iii) transforming the object-oriented design into a service-oriented architecture. A toolkit implementing our framework has been developed as an Eclipse Rich Client Platform (RCP) application. Our initial evaluation has shown that the proposed framework is effective in identifying services from an object-oriented design and migrating it to a service-oriented architecture.

iii

Acknowledgments First and foremost, I am deeply indebted to my supervisor, Professor Ladan Tahvildari, for her patient academic (and personal) guidance over the years. Her passion for doing and communicating innovative and creative science has and always will be a great source of inspiration. I feel very privileged to have worked with her. I wish to thank the members of my dissertation committee: Professor Kostas Kontogiannis and Professor Sagar Naik, for having accepted to take the time out of their busy schedule to read my thesis and provide me invaluable comments and inspiring remarks. I would like to thank all members of the Software Technologies and Applied Research (STAR) group for their tremendous support and cooperation. I want to thank my parents who have been extremely understanding and supportive of my studies. I want to thank my wonderful wife, Wei, who has encouraged me so much over the years. I also want to thank my lovely son, Zihan, for letting Dad work on his dissertation when he needed to do so. I feel very lucky to have a family that shares my enthusiasm for academic pursuits.

iv

Contents 1

2

Introduction

1

1.1

Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.2

Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

1.3

Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

Related Work

8

2.1

Program Comprehension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

2.1.1

Feature Locating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

2.1.2

Software Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

Program Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

2.2.1

Migrating Procedural Legacy Systems to Object-Oriented Paradigm . . .

13

2.2.2

Re-Engineering Existing Object-Oriented Systems . . . . . . . . . . . .

15

2.3

Architecture Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.4

Software Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.4.1

Identification of Reusable Components in Source Code . . . . . . . . . .

19

2.4.2

Creation of Services from Legacy Systems . . . . . . . . . . . . . . . .

21

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.2

2.5 3

Service-Oriented Componentization Framework

v

23

4

3.1

Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

3.2

Architecture Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

3.3

Service Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

3.4

Component Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

3.5

System Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

3.6

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

Architecture Recovery

30

4.1

XML Schema Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

4.1.1

UML Profile for XML Schemas . . . . . . . . . . . . . . . . . . . . . .

31

4.1.2

Representing XML Schemas in UML . . . . . . . . . . . . . . . . . . .

32

Modeling Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

4.2.1

Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

4.2.2

Source Code Models . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

Modeling Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

4.3.1

Definitions of Class Relationships . . . . . . . . . . . . . . . . . . . . .

39

4.3.2

Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

4.3.3

Class/Interface Relationship Graph . . . . . . . . . . . . . . . . . . . .

47

4.3.4

Class/Interface Dependency Graph . . . . . . . . . . . . . . . . . . . . .

49

4.3.5

An Example : Car Rental System . . . . . . . . . . . . . . . . . . . . .

51

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

4.2

4.3

4.4 5

Service Identification

54

5.1

Service Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

5.2

Supporting Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

5.2.1

Graph Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

5.2.2

Dominance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59

vi

5.2.3 5.3

5.4 6

62

The Proposed Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63

5.3.1

Top-Level Service Identification . . . . . . . . . . . . . . . . . . . . . .

64

5.3.2

Low-Level Service Identification . . . . . . . . . . . . . . . . . . . . . .

68

5.3.3

An Example : Car Rental System . . . . . . . . . . . . . . . . . . . . .

72

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

79

Component Generation and System Transformation

80

6.1

Component Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

81

6.1.1

Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

81

6.1.2

An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85

System Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89

6.2.1

Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89

6.2.2

An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

6.2

6.3 7

Modularization Quality Metric . . . . . . . . . . . . . . . . . . . . . . .

Empirical Studies

94

7.1

A Prototype for the SOC4J Framework . . . . . . . . . . . . . . . . . . . . . . .

95

7.1.1

Tool Integration Requirements . . . . . . . . . . . . . . . . . . . . . . .

95

7.1.2

JComp RCP Application . . . . . . . . . . . . . . . . . . . . . . . . . .

97

7.2

7.3

7.4

Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 7.2.1

Component Reusability . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

7.2.2

Architectural Improvement . . . . . . . . . . . . . . . . . . . . . . . . . 105

Case Study : Jetty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 7.3.1

Statistics of the Jetty . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

7.3.2

Discussions on Obtained Results . . . . . . . . . . . . . . . . . . . . . . 107

Case Study : Apache Ant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

vii

7.5 8

7.4.1

Statistics of the Apache Ant . . . . . . . . . . . . . . . . . . . . . . . . 113

7.4.2

Discussions on Obtained Results . . . . . . . . . . . . . . . . . . . . . . 114

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

Future Directions and Conclusions

119

8.1

Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

8.2

Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

8.3

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

A Top-Level Services of Jetty

123

B Top-Level Services of Apache Ant

125

viii

List of Tables 4.1

The Metric Suite at Class Level . . . . . . . . . . . . . . . . . . . . . . . . . . .

46

7.1

Statistics of the Jetty. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

7.2

Top-Level Services Identified from Jetty. . . . . . . . . . . . . . . . . . . . . . . 109

7.3

Low-Level Services Identified in Top-Level Service Win32 Server. . . . . . . . . 111

7.4

Some Time and Space Statistics of the SOC4J Framework on the Case Study : Jetty.113

7.5

Statistics of the Apache Ant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

7.6

Selected Top-Level Services Identified from Apache Ant. . . . . . . . . . . . . . 114

7.7

Low-Level Services Identified in Top-Level Service WAR File Creation. . . . . . 115

7.8

Some Time and Space Statistics of the SOC4J Framework on the Case Study : Apache Ant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

A.1 Top-Level Services of Jetty (1). . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 A.2 Top-Level Services of Jetty (2). . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 B.1 Top-Level Services of Apache Ant (1). . . . . . . . . . . . . . . . . . . . . . . . 125 B.2 Top-Level Services of Apache Ant (2). . . . . . . . . . . . . . . . . . . . . . . . 126 B.3 Top-Level Services of Apache Ant (3). . . . . . . . . . . . . . . . . . . . . . . . 127 B.4 Top-Level Services of Apache Ant (4). . . . . . . . . . . . . . . . . . . . . . . . 128 B.5 Top-Level Services of Apache Ant (5). . . . . . . . . . . . . . . . . . . . . . . . 129

ix

List of Figures 2.1

The Conceptual Model of Eisenbarth’s Approach. . . . . . . . . . . . . . . . . .

11

2.2

The Block Diagram of the Quality-Based Re-engineering Process. . . . . . . . .

16

2.3

The Dali Workbench. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

3.1

The Architecture of the Service-Oriented Componentization Framework. . . . . .

24

4.1

The Approach for Source Code Modeling. . . . . . . . . . . . . . . . . . . . . .

34

4.2

The Meta-Model for Java Package Models. . . . . . . . . . . . . . . . . . . . .

35

4.3

The Meta-Model for Java Source File Models. . . . . . . . . . . . . . . . . . . .

36

4.4

The Meta-Model for Java Classe/Interface Models. . . . . . . . . . . . . . . . .

37

4.5

The Meta-Model for Java Method/Constructor Models. . . . . . . . . . . . . . .

38

4.6

The Approach for Architecture Modeling. . . . . . . . . . . . . . . . . . . . . .

45

4.7

The UML Representation of XML Schema for Nodes in the CIRG. . . . . . . . .

48

4.8

The UML Representation of XML Schema for Nodes in the CIDG. . . . . . . . .

50

4.9

The CIRG of the Car Rental System (CRS). . . . . . . . . . . . . . . . . . . . .

51

4.10 The CIDG of the Car Rental System (CRS). . . . . . . . . . . . . . . . . . . . .

52

5.1

The UML Representation of XML Schema for a Service. . . . . . . . . . . . . .

56

5.2

An Example of a Directed Graph. . . . . . . . . . . . . . . . . . . . . . . . . .

58

x

5.3

(a) A connected component of the directed graph G in Figure 5.2. (b) The other connected component of G. (c) The only strongly connected component of G. (d) A rooted component of graph (a). (e) The other rooted component of graph (a). 59

5.4

(a) A Simple Directed Graph. (b) The Dominance Tree Corresponding to the Graph in (a). (c) All Two Maximal Consolidation Subtrees of the Dominance Tree in (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

5.5

Processes in Service Identification Stage. . . . . . . . . . . . . . . . . . . . . .

63

5.6

The MCIDGs of the Car Rental System. . . . . . . . . . . . . . . . . . . . . . .

73

5.7

The SHG of the Top-Level Service V ehicleBooking. . . . . . . . . . . . . . . .

74

5.8

The Result SHG of Performing the SHG Transformation on the Original SHG of the Top-Level Service V ehicleBooking in the CRS System. . . . . . . . . . . .

75

The Service Dominance Tree of the SHG in Figure 5.8. . . . . . . . . . . . . . .

76

5.10 The Reduced Dominance Tree of the Service Dominance Tree in Figure 5.9. . . .

77

5.9

5.11 The SHG Reconstructed from the Reduced Service Dominance Tree in Figure 5.10. 78 6.1

The UML Representation of XML Schema for a Component. . . . . . . . . . . .

83

6.2

The UML Class Diagrams of Customer and P erson in the CRS System. . . . .

86

6.3

Part of UML Class Diagram of the Component Customer. . . . . . . . . . . . .

88

6.4

The Meta-Model for the Component-Based Target System. . . . . . . . . . . . .

90

6.5

The Service Hierarchy Graphs of the CRS System. . . . . . . . . . . . . . . . .

92

6.6

The Component Hierarchy Graphs of the CRS System. . . . . . . . . . . . . . .

92

6.7

The Component-Based Car Rental System. . . . . . . . . . . . . . . . . . . . .

93

7.1

The Tool Interconnection for the SOC4J Framework. . . . . . . . . . . . . . . .

96

7.2

The Architecture of the JComp Java Componentization Kit. . . . . . . . . . . . .

98

7.3

A Snapshot of the JComp Java Componentization Kit. . . . . . . . . . . . . . . .

99

7.4

The Component Reusability Model. . . . . . . . . . . . . . . . . . . . . . . . . 104

xi

7.5

The Accepted Service View of the Extractor plug-in. . . . . . . . . . . . . . . . . 108

7.6

Iterations of the Service Aggregation Process of Top-Level Service Win32 Server. 110

7.7

The CHG of Top-Level Component Win32 Server of the Jetty. . . . . . . . . . . 111

7.8

The Reusability of Components Extracted from Jetty. . . . . . . . . . . . . . . . 112

7.9

The CHG of Top-Level Component WAR File Creation of the Apache Ant. . . . 116

7.10 The Reusability of Components Extracted from the Apache Ant. . . . . . . . . . 117

xii

Chapter 1

Introduction Billions of dollars are spent each year on computer software. Much of this effort is spent on creating and testing new source code. To save money, increase productivity, and improve quality and reliability, academic and industrial institutions have put a lot of effort into reusing existing software. The arrival of new software technology creates the need to leverage existing software assets in order to take advantage of the new technology, but implementing business-critical applications whenever a new technology arrives is impossible due to the time and resources required. The only option then is software re-engineering. Examples of such new software technologies that created big demand and market for suitable legacy systems re-engineering, wrapping and evolution methods are distributed object technology, component technology, the World Wide Web (WWW) and XML. Service-oriented computing has the potential to drastically change the way we develop software. When global markets for services provide the potential for reuse at a much greater scale, providing competitive services to these markets will be critical to the implementation of this vision as a whole as well as to the success of individual business. However, lots of what would make competitive services are already implemented in existing systems. The challenge then is how to transform the functionality of existing legacy systems fully or partially into services. Identifying,

1

CHAPTER 1. INTRODUCTION

2

extracting and re-engineering software components that implement abstractions within existing systems is a promising cost-effective way to create reusable assets and re-engineer existing software systems. Today, more and more organizations are migrating to service-oriented architectures (SOA) to achieve net-centric operations. This offers the potential of leveraging legacy systems by exposing some parts of the system as services within the SOA. However, there is often a lack of effective engineering approaches for identifying, describing, modeling, and realizing services embedded in existing software systems. The core of an SOA is a service which is a coarse-grained, discoverable, and self-contained software entity that interacts with applications and other services through a loosely coupled, often asynchronous, message-based communication model. The reuse of an existing software systems requires a comprehensive framework to identify and extract critical business services embedded in the existing system. A business service of a software system is an abstract resource that represents a capability of performing tasks that represents a coherent functionality from the point of views of provider entities and requester entities [40]. Effective system reuse and evolution require both the “big picture” and the lower level dependencies between portions of the source code. The focal point of the proposed research is to exploit the synergy between the areas of Program Comprehension [9, 21, 38, 69, 97, 100], Architecture Recovery [43, 54, 55, 59, 60, 63, 64], Software Reuse [34, 35], and Program Migration [57, 80–85, 96]. In this context, our goal is to develop a service-oriented componentization framework that decomposes an existing object-oriented system to re-modularize the existing assets to support service functionality. More specifically, the proposed framework should automatically support : i) identifying critical business services embedded in an existing Java system, ii) realizing each identified service as a self-contained component, and iii) transforming the object-oriented design into a service-oriented architecture. To be of practical use, such a re-engineering environment should be generic in the sense of being able to support different object-oriented existing systems.


3

In other words, it must be built upon a meta-model of object-oriented existing systems rather than upon a particular existing system. This avoids the cost of developing of a dedicated evolution environment for each target system. Hence the environment should consequently be configurable with a model of the target existing system which parameterizes the evolution environment with the existing system to be evolved and serves as a basis for specifying the components to be created. This research addresses a problem that has challenged the research community for several years - namely the asset reuse of object-oriented existing systems. It also devises a framework in which reuse and evolution activities do not occur in a vacuum, but can be monitored and finetuned by the user in order to address specific quality requirements for extracted components and the evolutive target system such as, component granularity and reusability, and system maintainability.

1.1

Problem Description

An effective way of leveraging the value of existing systems is to expose their functionalities as reusable components to a larger number of clients through well-defined component interfaces. Each component encapsulates a business service such as processing a payment, currency conversion, computing an insurance quotation, etcetera. In general, we have found that the code of existing systems represents a set of components with significant reuse potential. However, because the existing system does not have sufficient architecture or other high level documentation, it is difficult to understand both the “big picture” and the lower level dependencies between portions of the code. From the implementation point of view, the challenge consists of two phases : • Reverse Engineering : Identifying and extracting the top-level functions of an existing software system, and providing service descriptions for these identified functions. • Forward Engineering : Performing any necessary transformations to migrate the mono-


4

lithic architecture of the existing systems to a more flexible service-oriented architecture. In this thesis, we are interested in the reverse engineering challenge. Service identification is complicated by the usual obstacles of having to deal with potentially large and poorly structured existing systems. Identifying these service candidates for packaging as reusable components would require analysis of massive amounts of legacy code or at least graphic representations of the code. Additionally, it would require intervention of people with background in the business domain to judge what functions are likely to make reusable services. The identification of functions suitable for exposure as services can be seen as an instance of a more generic problem of functional decomposition of existing systems. Here we are required to abstract the code, or an alternative code representation (e.g., XML or graphs) to higher-level representations that describe the system architecture in terms of its functional units. Moreover, access points to these functional units would need to be identified as well. To reuse the identified services and migrate the existing system’s implementation into a component-based architecture, it might be necessary to package the identified services into welldocumented and self-contained components, during the forward engineering phase. If service packaging is required, then it needs techniques for automatically extracting the relevant procedural elements from existing systems, and creating an interface for components. Furthermore, a formal description needs to be developed for each service. Service descriptions should document possible dependencies between service invocations, beside syntactic information on the number and types of parameters. Such descriptions are crucial for developers to implement applications based on the services extracted and should therefore be presented in a way they can understand. We seek a combination of solutions from three different domains in order to tackle the service identification, service modeling, and service packaging problem : • Source Code Analysis and Reverse Engineering Technology. We aim to create a frame-


5

work that has methodological and technological steps to recover higher-level design and architecture representations of existing software systems based on the source code artifacts. This includes creation of a suitable representation of design and architectural models that reflect the functional decomposition of the system. To distinguish these models from each other, design models are more detailed and refer to different parts of a system, whereas architectural models are more abstract and refer to the system as a whole. Our starting point for this line of work is exploring the existing body of work on architecture recovery and reconstruction, as well as software clustering in searching for suitable algorithms and ideas. • UML Technology. Models in the UML will provide a high-level representation of analysis results and service descriptions that is understandable for both software developers and business experts. As a universal language, the UML provides standard notations for almost all aspects of a system. Structural features like data types, operation signatures, and architectures are captured by class and component diagrams. System behavior, including scenarios, processes, and protocols are captured by sequence or activity diagrams as well as state-charts. We use component diagrams to provide a high-level overview of the proposed services, components and their interfaces. Based on this representation, the users can validate the proposed services. • Graph Transformation Technology. We utilize the graph transformation technology to implement mappings between different graphical representations of programs and models. The strength of the approach lies in the fact that model transformations can be expressed graphically, based on Meta-Object Facility (MOF) models for the source and target models. Also in the service identification phase, the graph transformation technology can be used to agglomerate services.


1.2

6

Thesis Contribution

This thesis aims to design a framework that helps to reuse the assets of existing systems and migrate their object-oriented design to service-oriented architectures. This deals with the longstanding problem of reusing and evolving existing object-oriented systems in the following ways : • By designing and implementing comprehensive graphic representations of an object-oriented system in different levels of abstraction. • By exploring an incremental program comprehension approach, including describing an object-oriented software system using different concurrent views, each of which addresses a specific set of concerns of the system. • By designing and implementing an efficient and effective methodology for identifying and realizing critical business services embedded in an existing object-oriented system. • By designing and implementing an object-oriented restructuring methodology that transforms the typically monolithic architectures of existing systems to more flexible serviceoriented architectures. • By designing and implementing a prototype system that supports the identification and realization of critical business services embedded in an Java software system and the componentization of the Java System.

1.3

Thesis Organization

This thesis is organized as follows : • Chapter 2 reviews the related work, with the aim of putting this thesis in context. It covers four research areas that form the foundation of this thesis : Program Comprehension, Architecture Recovery, Software Reuse, and Program Migration.


7

• Chapter 3 gives an overview of the service-oriented componentization framework for Java software systems. This framework uses graph representations of an existing object-oriented software system and graph transformations to identify business services embedded in the system. Furthermore, the framework realizes each identified service into a self-contained component and transforms the object-oriented design into a service-oriented architecture. The proposed framework is composed of four stages : Architecture Recovery, Service Identification, Component Generation, and System Transformation. • Chapter 4 discusses reverse engineering techniques used within the architecture recovery stage to build source code models and architectural models of an existing object-oriented software system. • Chapter 5 presents the service identification strategy and algorithm that are used within the service identification stage to identify critical business services embedded in an existing object-oriented system. • Chapter 6 discusses the processes within the component generation stage and the system transformation stage. It covers the service packaging technique and architecture reconstruction technique. • Chapter 7 shows the application of the proposed service-oriented componentization framework on some real world Java projects. The prototype of the framework and framework evaluation criteria will be introduced. Case studies will be explained and the results will be discussed. • Chapter 8 presents the conclusions of this research work and discusses possible directions future research might take. • Appendix A and B list and describe the identified business services from the case studies.

Chapter 2

Related Work In this chapter we review the related work, with the aim of putting this thesis in context. We survey four research areas that form the foundation of this thesis, namely Program Comprehension, Program Migration, Architecture Recovery, and Software Reuse. The Program Comprehension section outlines approaches for locating features in source code and techniques for software clustering. The Architecture Recovery section presents the technologies used in the software architecture recovery domain. The Software Reuse section reviews the techniques for identifying reusable components in source code and creating services from legacy systems. The Program Migration section discusses current methodologies for migrating procedural legacy systems to the object-oriented paradigm and re-engineering existing object-oriented systems. Finally, the last section summarizes the material presented in this chapter.

2.1

Program Comprehension

The identification of potentially reusable services embedded in an existing system requires an understanding of the functionality of each parts of the system. Program understanding or analysis in general includes any activity that uses dynamic or static methods to reveal the properties of

8

CHAPTER 2. RELATED WORK

9

existing systems. It most commonly refers to an examination of source code, without the use of any specification or execution information. There are two main subjects related to our work : Feature Locating and Software Clustering.

2.1.1

Feature Locating

A feature is a realized functional requirement of a system [30]. Generally, the term feature also subsumes non-functional requirements. In the context of this research, only functional features are relevant; that is, we consider a feature to be an observable behavior of the system that can be triggered by the user. Understanding the implementation of a certain feature of a system requires identification of the computational units of the system that contribute to this feature. In many cases, the mapping of features to the source code is poorly documented. Wilde et al. [93] were pioneers in locating features taking a fully dynamic approach. The goal of their Software Reconnaissance is the support of maintenance programmers when they modify or extend the functionality of a legacy system. Based on the execution of test cases for a particular feature f , several sets of computational units are identified : • computational units commonly involved (code executed in all test cases, regardless of f ), • computational units potentially involved in f (code executed in at least one test case that invokes f ), • computational units indispensably involved in f (code that is executed in all test cases that invoke f ), and • computational units uniquely involved in f (code executed exactly in cases where f is invoked).


10

A computational unit is an executable part of a system. Examples for computational units are instructions (like accesses to global variables), basic blocks, routines, classes, compilation units, components, modules, or subsystems. Since the primary goal is the location of starting points for further investigations, Wilde et al. focus on locating specific computational units rather than all required computational units. Another approach, based on dynamic information, was presented by Wong et al. [95]. They analyzed execution slices of test cases implementing a particular functionality. The process was described as follows : 1. The invoking input set I (i.e., a set of test cases) is identified that will invoke a feature. 2. The excluding input set E is identified that will not invoke a feature. 3. The program is executed twice using I and E separately. 4. By comparison of the two resulting execution slices, the computational units can be identified that implement the feature. In [94], Wong et al. presented a way to quantify features. Metrics are provided to compute the dedication of computational units to features, the concentration of features in computational units, and the disparity between features. In [21], Chen and Rajlich proposed a semiautomatic method for feature location, in which the programmer browses the statically derived Abstract System Dependency Graph (ASDG). The ASDG describes detailed dependencies among routines, types, and variables at the level of global declarations. The navigation on the ASDG is computer-aided and the programmer takes on all the search for a feature’s implementation. The method takes advantage of the programmer’s experience with the analyzed software. It is less suited to locate features if programmers without any pre-knowledge do not know where to start the search.


11

Eisenbarth et al. [30] presented a semiautomatic technique that reconstructs the mapping for features that are triggered by the user and exhibit an observable behavior. The mapping is in general not injective; that is, a computational unit may contribute to several features. Their technique allows for the distinction between general and specific computational units with respect to a given set of features. For a set of features, it also identifies jointly and distinctly required computational units. The presented technique combines dynamic and static analysis to rapidly focus on the system’s parts that relate to a specific set of features. Dynamic information is gathered based on a set of scenarios invoking these features. Figure 2.1 illustrates the conceptual model used by Eisenbarth et al. It describes the relationships among features, scenarios, and computational units.

invokes

Scenario

implemented by

Feature *

*

Computational Unit *

*

Basic Block

Routine

Module

Figure 2.1: The Conceptual Model of Eisenbarth’s Approach.

In [92], Wilde and Rajlich compared two feature locating approaches, namely the Software Reconnaissance technique and the Dependency Graph Search method. In the presented case study, both techniques were effective in locating features. The Software Reconnaissance showed to be more suited to large infrequently changed programs, whereas the Dependency Graph Search method was found to be more effective if further changes are likely and require deep and more complete understanding.


12

2.1.2 Software Clustering Clustering techniques have been used in many disciplines to support the grouping of similar objects of a system. Clustering analysis is a technique used for combining observations into groups or clusters such that each group or cluster is homogeneous or compact with respect to certain characteristics and each group should be different from other groups with respect to the same characteristics [73]. The primary objective of clustering analysis is to take a set of objects and characteristics with no apparent structure and impose a structure upon them with respect to a characteristic. The primary objective of clustering analysis is to facilitate better understanding of the observations and the subsequent construction of complex knowledge structures from features and object clusters. Most clustering approaches attempt to provide solutions in restructuring legacy systems. Belady and Evangelisti introduced an approach that automatically clusters a software system in order to reduce its complexity [6]. They also provided a measure for the complexity of a system after it has been clustered. Their clustering approach was based on the information extracted from the documentation of the system. Müller et al. [63, 64] implemented several software clustering heuristics in the Rigi tool that (i) measure the relative strength between interfaces, (ii) identify omnipresent modules, and (iii) use the similarity between module names. They introduced the important principles of small interfaces (the number of elements of a subsystem that interface with other subsystems should be small compared to the total number of elements in the subsystem) and of few interfaces (a given subsystem should interface only with a small number of the other subsystems). Hutchens and Basili [43] developed an algorithm that clusters procedures into modules by measuring the interaction between pairs of procedures. Their clustering technique was based on data bindings. A data binding was defined as an interaction between two procedures based on


13

the location of variables that are within the static scope of both procedures. Based on the data bindings, a hierarchy is constructed from which a partition can be derived. They compared their structures with the developer’s mental model with satisfactory results and evaluated the stability of the system, focusing on what happened with the clustering when changes are done. Mancoridis et al. [55] treated clustering as an optimization problem and used genetic algorithms to overcome the local optima problem of hill-climbing algorithms, which are commonly used in clustering problems. They implemented a tool called Bunch [54] that can generate better results faster when users are able to integrate their knowledge into the clustering problems. They also show how the subsystem structure of a system can be maintained incrementally after the original structure has been produced.

2.2

Program Migration

Program transformation is the act of changing one program into another. The language in which the program being transformed and the resulting program are written are called the source and target languages, respectively. Program transformation is used in many areas of software engineering, including compiler construction, software visualization, documentation generation, and automatic software renovation. There are two main subjects related to our work : Migrating Procedural Legacy Systems to Object-Oriented Paradigm and Re-Engineering Existing ObjectOriented Systems.

2.2.1 Migrating Procedural Legacy Systems to Object-Oriented Paradigm Many researchers have proposed different methodologies for migrating the architecture or the code of software systems written in a procedural language to comply with object-oriented paradigms. For instance, Martin and Müller [57] reported cased studies on transliterating C source code to Java using Ephedra method. The method includes three processes :


14

• Insertion of C function prototypes, • Data type and type cast analysis, and • Transliteration of source code. By applying the Ephedra method, parts of C code can be implemented into Java platforms which makes it possible to avoid a complete redevelopment of the business logic that was already presented in the current application. However, the difficulty in using this method is that as C is a procedural language and Java is an object-oriented language, not only do the syntax and semantics of the source code need to be translated, but also a paradigm shift is necessary to move from procedural to object-oriented code. Wong and Li [96] proposed a stepwise approach for abstracting object-oriented designs from procedural source code : • Abstract the program structure, such as procedure and variable call graphs, and group variables as well as procedures into classes by using structure similarity and pattern matching, • Conduct dynamic code partition using an execution sliced-based technique and visualizing various functionalities in the code, and • Refine the object-oriented design generated in the previous step, if necessary, with the aid of simulation. Web enabling the existing applications offers high leverage and good return on investment. The web enabling process may involve the following issues : • Wrapping the existing legacy application with Internet technologies. The advantage of this process is that previous investment into legacy code remains intact. Also, by segregating the user interface from the business logic module of the legacy application, only that which is required for making the application “Internet aware” is modified.


15

• It is important to establish the proof of concept on the proposed solution by web enabling a part of the system instead of the whole. This in turn can help in defining the long-term strategy on the appropriate solution that will best suit the organization. • An existing legacy application might need to be reconstructed to leverage the existing business process. In [101], Zou and Kontogiannis presented a framework to address these issues on migrating legacy systems into a web-enabled environment by involving the CORBA wrapper and the SOAP CORBA IDL translator. The migration process focuses on specifying the identified legacy components in XML, consequently wrapping them by CORBA objects, and finally deploying the distributed component into the application server. A scripting language that is encoded in an XML format can be used for allowing thin clients to communicate with legacy components.

2.2.2 Re-Engineering Existing Object-Oriented Systems Computing environments are evolving from mainframe systems to distributed systems. Standalone programs that have been developed using object-oriented technology are not suitable for these new environments. Hence, many researchers have addressed these issues by re-engineering the existing object-oriented systems. Tahvildari and Kontogiannis [80, 86] presented a framework for providing quality-based and quality-driven re-engineering of object-oriented systems. The framework adopts an incremental and iterative re-engineering process model that is driven by the soft-goal interdependency graphs. The re-engineering process includes the following steps as illustrated in Figure 2.2. First, the source code is represented as an Abstract Syntax Tree. The tree is further decorated using a linker, with annotations that provide linkage, scope, and type information. Once software artifacts have been understood, classified and stored during the reverse engineering phase, their behavior can be available to the system during the forward engineering phase. Then, the forward engineering


Goal-Driven

Source Code

Non-Functional Requirements

High-Level Source Code Representation

16

Transformation Rules

New Code

Evaluation

Final System

ASG, AST, RSF, … UML Diagrams

Figure 2.2: The Block Diagram of the Quality-Based Re-engineering Process. phase aims to produce a new version of a legacy system that operates on the target architecture and aims to address specific non-functional requirements. Finally, the framework uses an iterative procedure to obtain the new migrant source code by selecting and applying a transformation which leads to performance or maintainability enhancements. The transformation is selected from the soft-goal interdependency graphs. The resulting migrant system is then evaluated and the step is repeated until quality requirements are met. Fanta and Rajlich [32] re-engineered the object-oriented program to improve the program structure and thus its maintainability. A deteriorated C++ application was restructured to move “misplaced” code and data from their original classes to the classes they naturally belong to. Gleich and Kohler [37] proposed an approach for transforming object-oriented legacy systems into modern framework-based architectures in order to improve their maintainability. They also provided a reference architecture for re-engineering tools and a few tool-prototypes which were developed at Daimler-Benz. Xu et al. [97] presented an approach to program restructuring at the functional level based on the clustering technique with cohesion as the main concern. The approach focused on automated support for identifying ill-structured or low cohesive functions and providing heuristic advice in both development and evolution phases. The empirical observations showed that the heuristic


17

advice provided by the approach can help software designers make better decision of why and how to restructure a program.

2.3


One of the areas in software architecture is architecture recovery through reverse engineering of existing implementations. Knowing the architecture of a software system may play an important role in maintenance and evolution of the system. This knowledge helps the developer to know where in the system to modify and what parts of the system will be affected by the change. Moreover, in order to decompose an existing system, there is a need for an efficient architecture recovery process. One of the areas in software architecture is architecture recovery through reverse engineering of existing implementations.

View Extraction

Figure 2.3: The Dali Workbench.


18

Since architecture recovery has received considerable attention recently, numerous articles have been published on this topic and various frameworks, techniques and tools have been developed. Basically, existing knowledge, obtained from experts and design documents, and various tools are necessary to solve the problem. For instance, Kazman and Carrière presented a workbench for architectural extraction called Dali [48]. Figure 2.3 illustrates Dali’s architecture. In this workbench, a variety of lexical-based, parser-based and profiling-based tools are used to examine a system and extract static and dynamic views to be stored in a repository. Analysis of these views is supported by visualization and specific analysis tools. They enable an interaction with experts to control the recovery process until the software architecture is reconstructed. Another architecture recovery approach was proposed by Guo et al. in [42], called Architecture Recovery Method (ARM). ARM is semi-automatic analysis method for reconstructing architectures based on the recognition of architectural patterns. Existing knowledge gained from design documentation is used to define queries for potential pattern instances which are then applied automatically to extracted and fused source model views. Human evaluation is required to determine which of the detected pattern instances are intended, and which are false positive and false negative. ARM supports patterns at various abstraction levels and uses lower-level patterns to build higher-level patterns and composite patterns. In this way the approach is aimed particularly at systems that have been developed using design patterns whose implementations have not eroded over time. Dominance analysis is a fundamental concept in compiler optimizations and has been used extensively to identify loops in basic block graphs [61]. It allows one to locate subordinated software elements in a rooted dependency graph. Dominance analysis on call graphs of procedural language applications has been used in reverse engineering to identify modules and subsystems and recover system architectures [17, 26, 36]. Cimitile and Visaggio [26] first introduced dominance analysis as a method to identify related parts of an imperative system. This idea was further elaborated on in [17, 36]. The authors applied dominance analysis on call graphs of procedural


19

language applications to identify modules and subsystems. In this research, we explore the use of dominance analysis to identify services from an object-oriented application.

2.4

Software Reuse

Software reuse enables applications to be developed faster and less expensively. It also offers numerous other benefits, including : • Return on Investment. Components built or purchased by a company for one particular project can be reused in future projects, maximizing the company’s return on investment. • Adaptability. With component-based development (CBD), applications can be easily adapted to respond to changing business needs. The modular nature of components enables them to be easily modified, added, deleted or swapped to provide new or enhanced functionality. • Reliability. Reusing software components decreases the risk of operational glitches because the components have already been previously tested in other applications. Current software reuse techniques include object-orientation, component-based software development, and service-based development. In this section, we review two topics on software reuse which are relevant to this research work : Identification of Reusable Components in Source Code and Creation of Services from Legacy Systems.

2.4.1 Identification of Reusable Components in Source Code Re-engineering legacy systems into component-based systems involves identifying reusable pieces, or components, of the legacy system so that the system can be restructured using those pieces. These components are actually modules of the system’s code that perform certain business functions independently by processing a specific set of data. Once such components are identified


20

in the system, they can be “mined”, or extracted, and reused to build a component-based system [39]. The component identification exercise first requires the software developer to gain an understanding of the legacy system. A software system can be understood in the following terms : • Different elements of the system such as programs, jobs, and data files. • Relationships that exist between those elements. Also different views can be constructed based on these elements and their relationships to each other, for instance, a call graph can be created to show the relationship between various programs. Once we gain an understanding of how the legacy system is built, we need to break the system down into components. This can be accomplished by selecting certain points within the system and expanding the boundaries of those points until all related system elements are included within the boundaries. The process of expanding these boundaries may be driven primarily by system queries, documentation on the system, its maintenance history, and the knowledge of those who have worked with the system in the past. The component identification approach can be classified into two categories [39] : DataCentric Identification and Event-Centric Identification. The data-centric approach to component identification involves analyzing the different types of data within the system, identifying the business functions performed on each type of data and pinpointing where each business function is performed throughout the system. Once a unique, independent business function is identified and isolated, it can then be segregated as a component. The event-centric approach to component identification is used to identify components in event- driven systems such as online Customer Information Control System (CICS) programs. Most online CICS programs are driven by events generated either by user input or internal programs. In an event-driven system, any time an event takes place, specific code within the system is executed. Components can be identified by triggering an event and isolating the specific business functions that result from that event. In this


21

research, we focus only on the Data-Centric Identification approaches. Caldiera and Basili introduced the Computer Aided Reuse Engineering (Care) system, which describes an algorithmic approach for program understanding, to support identifying reusable components using the user-defined reusability attribute model based on software metrics in the context of a procedural paradigm [18]. Etzkorn and Davis presented an approach for identifying reusable classes from object-oriented systems based on the understanding of comments and identifiers in the source code [31]. Their tool CHRis uses natural-language techniques to help users decide whether a class implements certain useful functionality. In [4], Bansiya and Davis introduced a Quality Model for Object-Oriented Design (QMOOD) which measures functional, structural and relational details of the system based on high-level attributes. In the model, they calculate reusability based on coupling, cohesion, and design size. Shin and Kim proposed techniques for transforming an available object-oriented design into a component-based design [75]. Their techniques focus on formal model specification and transformation. None of these methods, however, provide hierarchical structures nor propose the reconstruction of the system’s original architectural design. We aim to develop techniques for recovering high-level design, to extract the service hierarchy embedded in object-oriented systems, and to migrate object-oriented designs to service-oriented architectures.

2.4.2 Creation of Services from Legacy Systems A software service of a software system is an abstract resource that represents a capability of performing tasks that represent a coherent functionality, from the point of views of both the provider and the requester of the software [40]. A service should have well-defined functional interface and be easily discovered and accessed [99]. A service-based development paradigm, or services model [34], is one in which components are viewed as services. In this model, ser-


22

vices can interact with one another and be providers or consumers of data and behavior. Some of the defining characteristics of service-based technologies include modularity, availability, description, implementation-independence, and publication [34]. In the service-based development paradigm, a primary focus is upon the definition of the interface needed to access a service (description) while hiding the details of its implementation (implementation-independence). Gannod et al. described an architecture-based approach for the creation of services from legacy components using wrapping, or adapters and the subsequent integration of these services with service-requesting client applications [35]. The technique utilizes an architecture description language to describe components as services and achieves run-time integration using Jini [47] middleware technology. The methodology involves two steps for creating services : (i) specification of components as services; and (ii) generation of services using proxies via the construction of appropriate adapters and glue code. These services are consequently registered and made available on a network. Mehta and Heineman [59, 60] integrated the concepts of features, regression tests, and the component-based software engineering (CBSE) into an approach for evolving procedural legacy systems. This methodology was divided into three parts : i) selecting test cases by considering features that need evolution; ii) executing selected test cases using code profilers to locate source code that implements features and analyzing and refactoring located source code to create components; iii) comparing pre- and post-evolution maintenance costs.

2.5

Summary

In this chapter, we have reviewed four principle research fields upon which this thesis is founded : Program Comprehension, Program Migration, Architecture Recovery, and Software Reuse. The aim of this chapter is to provide a general background to existing and ongoing research in these areas. In subsequent chapters, we will present our own contributions in more detail, and also present detailed analysis of our approach in comparison to closely related work.

Chapter 3

Service-Oriented Componentization Framework Since many competitive services have already been implemented in existing systems, leveraging the value of an existing system by exposing all or parts of it as services within a service-oriented environment has become a major concern in today’s industry. Identifying of functions suitable for exposure as services can be seen as an instance of a more generic problem of functional decomposition of existing systems. To reuse the identified services and migrate the existing system’s implementation into a service-oriented environment, one needs to package the identified services into well-documented and self-contained components, during the forward engineering phase. In this research, we develop a service-oriented componentization framework for the Java software system, which decomposes an existing object-oriented system to re-modularize the existing assets to support service functionality. More specifically, the proposed framework automatically supports : i) identifying critical business services embedded in an existing Java system, ii) realizing each identified service as a self-contained component, and iii) transforming the object-

23

CHAPTER 3. SERVICE-ORIENTED COMPONENTIZATION FRAMEWORK

24

oriented design into a service-oriented architecture. We name the proposed componentization framework as the SOC4J framework. This chapter outlines the proposed SOC4J framework, while the details are discussed more thoroughly in subsequent chapters.

Stage 1: Architecture Recovery

Legend

Source Code Modeling Source code models (Facts)

Data Flow

Control Flow

Process

Source Code

Architecture Modeling Architectural models

Stage 4: System Transformation

Architecture Reconstruction Component-Based System

Top-Level Service Identification Top-level services and atomic sub services contained in each top-level service Validated services (Top-level services and their low-level services)

Low-Level Service Identification Stage 2: Service Identification

Self-Contained Components

Component Generation

Self-Contained Component Repository

Stage 3: Component Generation

Figure 3.1: The Architecture of the Service-Oriented Componentization Framework.

3.1

Framework Overview

The proposed SOC4J framework uses graph representations of an existing object-oriented software system and graph transformations to identify business services embedded in the system. In this research, we are interested in the reverse engineering challenge. Service identification is complicated by the usual obstacles of having to deal with potentially large and poorly structured existing systems. Identifying these service candidates for packaging as reusable components


25

would require analysis of massive amounts of legacy code or at least graph representations of the code. Additionally, it would require intervention of people with background in the business domain to judge what functions are likely to make successful services. The identification of functions suitable for exposure as services can be seen as an instance of a more generic problem of the functional decomposition of existing systems. Here, we are required to abstract the code, or an alternative code representation (e.g., XML or graphs), to higher-level representations that describe the system architecture in terms of its functional units. Furthermore, the framework realizes each identified service into a self-contained component and reconstructs the object-oriented design into a service-oriented architecture. To reuse the identified services and migrate the existing system’s implementation into a component-based architecture, it is necessary to package the identified services into well-documented and self-contained components. Service packaging needs techniques for automatically extracting the relevant procedural elements from the existing system and creating an interface for components. Also, the restructuring of object-oriented systems requires a comprehensive framework to relate refactoring operations and software transformations with non-functional requirements. As illustrated in Figure 3.1, the proposed componentization framework is comprised of four stages : Architecture Recovery, Service Identification, Component Generation, and System Transformation. The following sections elaborate on each stage of these stages.

3.2


Software architecture recovery aims at reconstructing views on the architecture as-built. Effective system reuse and evolution require both the “big picture” and the lower level dependencies between portions of the source code. The identification of functions suitable for exposure as services can be seen as an instance of a more generic problem of functional decomposition of existing systems. In this problem, it is required to abstract the code, or an alternative code representation (e.g.,


26

XML or graphs) to higher-level representations that describe the system architecture in terms of its functional units. In the architecture recovery stage, we aim to create a framework that has methodological and technological steps to recover higher-level design and architecture representations of existing software systems based on source code artifacts. This includes the creation of a suitable representation of design and architectural models that reflect the functional decomposition of the system. To distinguish them from each other, design models are more detailed and refer to different parts of a system, whereas architectural models are more abstract and refer to the system as a whole. There are two goals we are trying to achieve at this stage : i) building complete data models for Java source code at different abstracted levels to support a wide range of structural analysis and recovery, and ii) establishing a repository of relationships among classes and interfaces which can easily be queried in the service identification stage.

3.3

Service Identification

Identifying critical business services embedded in an existing Java system is one of the primary tasks of the SOC4J framework. Essentially, the service identification process of the SOC4J framework is to identify related modules in the system. This process is based on the analysis on the recovered architectural information obtained from the previous chapter. A business service of a software system is an abstract resource that represents a capability of performing tasks that represent a coherent functionality from the point of views of both the provider and the requester. In order to clearly describe and automate the service identification process, we categorize the service embedded in an object-oriented system into two classes : i) Top-level services that are not used by another service but may contain a hierarchy of low-level services further describing the service, and ii) Low-level services that are underneath top-level service and may be agglomerated with other low-level services to yield a new service with a


27

higher level of granularity. Furthermore, a formal description needs to be developed for each service. Such descriptions should document possible dependencies between service invocations, beside syntactic information on the number and types of parameters. Such descriptions are crucial for developers to implement applications based on the services extracted and should therefore be presented in a way understandable to them. In the service identification stage, we aim to identify both the top-level services and the lowlevel services embedded in an existing system. The proposed service identification approach is supported by a combination of top-down and bottom-up techniques. In the top-down portion of the process, we identify the top-level services and the atomic services underneath each top-level service. In the bottom-up portion, we aggregate the atomic services to identify services with higher level of granularity, using graph transformations.

3.4


An effective way of leveraging the value of existing systems is to expose their functionalities as reusable components to a larger number of clients through well-defined component interfaces. Hence, the identified services should be packaged as components so that they can be deployed and thus invoked. Moreover, in order to migrate the existing system’s implementation into a component-based architecture, it might be necessary to package the identified services into components. If service packaging is required, then it needs techniques for automatically extracting the relevant procedural elements from the existing system, and creating an interface for components. The service-oriented architecture (SOA) encourages individual services to be self-contained. A self-contained component is a component that contains all code necessary to implement its services and hence it can be deployed independently. At the third stage of the proposed SOC4J framework, we realize each top-level service and the low-level services underneath the top-level service into self-contained components. More specifically, for each identified service, we extract


28

all classes and interfaces that are necessary for implementing the service, generate an interface for the service, and package these classes/interfaces together with the interface as a JAR file. As Figure 3.1 depicted, the output of this stage is a repository of self-contained components. The quality of the component is important in order to succeed in the reuse-driven development process. Key qualities of good reusable components include correctness, complexity, observability, testability, customizability, and performance. However, most of these qualities are not directly measurable. In this thesis, we aim at assessing the reusability of the extracted components through the analysis of their interfaces and internal methods. Reusability is a high-level quality of software components and hence it is the result of the combination and interaction of many low-level properties. We define a component reusability model that typically shows the reusability as being composed of quality properties such as complexity, observability, customizability, and external dependency.

3.5

System Transformation

A component-based system is built by combining and interconnecting the components. Therefore, the component-based approach supports reusability and flexibility. Based on the components that realize the identified business services, transforming the monolithic architecture of an existing object-oriented system to a more flexible service-oriented architecture is another goal of the proposed SOC4J framework. In the system transformation stage, we aim at reconstructing an existing Java system into a component-based system by using the generated component from the source system. A reference model for the component-based target system has been presented. The system transformation process should preserve the functionality of the original system. The surrounding parts of the component should use newly extracted components in order to avoid the situation where two sets of classes providing the same functionalities exist in the same system.


29

As Figure 3.1 shows, the output of this stage is a component-based system providing the same functionality as the original system.

3.6

Summary

In this chapter, we outlined the proposed service-oriented componentization framework. The role of each stage of the framework has been discussed. We will present the techniques used within each stage in the subsequent chapters.

Chapter 4

Architecture Recovery Software architecture recovery aims at reconstructing views on the architecture as-built. Knowing the architecture of a software system plays an important role in the maintenance and evolution of the system. This knowledge helps the engineer to know where in the system to modify and what parts of the system will be affected by the change. Moreover, in order to componentize an existing system, there is a need for an efficient architecture recovery process. The first stage of the service-oriented componentization framework is the architecture recovery stage. There are two goals we are trying to achieve at this stage : • Building complete data models for Java source code at different levels of abstraction to support a wide range of structural analysis and recovery, and • Establishing a repository of relationships among classes and interfaces which can easily be queried in the service identification stage. This chapter discusses two main processes contained in the architecture recovery stage : Source Code Modeling process and Architecture Modeling process. In Section 4.1, we discuss the UML representation of XML schemas which we define in this thesis. We explain the source code modeling process in Section 4.2, while the architecture modeling process is discussed in 30

CHAPTER 4. ARCHITECTURE RECOVERY

31

Section 4.3. Finally, Section 4.4 summarizes this chapter.

4.1

XML Schema Representation

As designed, the output of each stage of the componentization framework is presented as XML documentations. Before we delve into the processes of each stage in the framework, we need to find an understandable and formal way to present the XML schemas we define in each stage. UML [65] is being used as the de-facto standard for software development; therefore a need arises to integrate XML schemas into UML-based software development processes. Not only is the production of XML schemas out of UML models required, but also the integration of XML schemas as input into the development process, because standard data structures and document types are part of the requirements [7]. In this section, we describe the UML representation of XML schemas that we define in the rest of the thesis.

4.1.1 UML Profile for XML Schemas Existing work on representing XML schemas in UML has emerged from approaches to platform specific modeling in UML and transforming these models to XML schemas, with the recognized need for UML extensions to specify XML schemas peculiarities. Booch et al. first presented an approach to modeling XML schemas using UML notation in [11]. Although based on a predecessor to XML schemas, it introduced UML extensions addressing the modeling of elements and attributes, model groups, and enumerations that can also be found in recent approaches. Bernauer et al. [7] summarized and compared recent main approaches to represent XML schemas in UML as follows : • Carlson [19] described an approach based on XMI rules for transforming UML to XML schemas. Carlson introduced a UML profile which addresses most XML schema concepts, except for simple content complex types, global elements and attributes, and identity


32

constraints. Regarding semantic equivalence, the profile has some weaknesses in its representation of model groups, i.e., sequence, choice, and all elements in XML schemas. • Provost [67] addressed some of the weaknesses of [19] by addressing representation of enumerations and other restriction constraints, and of list and union type constructors, although the latter doesn’t conform to UML. • David Carlson [19] defined a UML profile for representing XML schemas that was based on the XML conceptual models discussed in [27]. Their UML profile addressed some enhancements regarding simple types and notations. • Routledge et al. [71] pointed out the importance of separating the conceptual schema (i.e., the platform independent model) from the logical schema (i.e., the platform specific model). This separation is not considered in the other approaches. They considered the logical schema as direct, one-to-one representation of the XML schema in terms of a UML profile. The profile that they defined covers almost all concepts of XML schema, but several of its representations do not conform to UML. • Bernauer et al. [8] adapted the approach proposed in [71] to aim at a one-to-one representation of XML schemas in an UML profile. Their approach was built on the existing UML profiles for XML Schemas, with some improvements and extensions.

4.1.2 Representing XML Schemas in UML By applying the UML profile, we represent the XML schema defined in this research in UML notation. We propose three criteria to choose an existing UML profile for an XML schema : 1. The UML profile provides a semantically equivalent representation of an XML schema in UML supporting a bijective mapping between both representations. In order to satisfy this


33

requirement, the profile has to address the whole range of XML schema concepts such that any XML schema can be expressed in UML. 2. The UML profile supports round-trip engineering, that is, transformation from XML schema to UML and back again without loss of schema information. 3. The UML profile maximizes understandability of semantic concepts by users knowledgeable of UML but not XML schema. By examining the result of the evaluation performed in [7], we adopt the UML profile defined in [71] to represent the XML schema throughout this research work. The UML profile for the XML schema provided in [71] contains classes and associations that represent constructions found in the XML schema specification [88]. It is intended that every concept in an XML schemas has a corresponding representation in the UML profile (and vice versa). As a result, there is a one-to-one relationship between the logical (UML notation) and physical (XML schema notation) XML schema representations.

4.2

Modeling Source Code

Fact extraction from source code (i.e., finding pieces of information about the system) is a fundamental step of reverse engineering and often has to be performed first. That means the before performing any high-level reverse engineering analysis or architecture recovery activities, available information in the source code has to be extracted and aggregated in a fact base. Such a fact base forms the foundation for further analysis tasks that are conducted next. We aim to build a complete data model set for Java source code at different levels of abstraction to support a wide range of structural analysis and recovery. These models are essential for representing the system at the source code level and computing reusability attributes for each individual class. The source code models are presented as XML documents and form the Basic View (BView) of the


34

system [51].

4.2.1 Approach There are a number of existing meta-models for representing object-oriented software. Most of those are aimed at Object-Oriented Analysis and Design (OOAD), the most notable example being the Unified Modeling Language (UML). However, these meta-models represent software at the design level. Re-engineering requires information about software at the source code level. We propose an automated approach for modeling the entities of Java software systems at the source code level. The approach is based on the Java Compiler Compiler (JavaCC) [44] as Figure 4.1 depicted.

Java Source Code

Interpreter

Raw Data

Model Generator

Source Code Models (XML doc)

JavaCC Grammar

JavaCC (Java Compiler Compiler)

Control Flow

Data Flow

Figure 4.1: The Approach for Source Code Modeling. Source code parser construction tools have been around for serval years. The best known of these are the famous yacc [98] and lex [50] tools from the Unix domain or their GNU versions bison [10] and flex [33]. These tools, as well as their successors, allow a stream of input data to be parsed based on two constructs : • Tokens. A token is a sequence of input characters that has meaning based upon the desired syntax. The first step in parser construction is to extract tokens from the input stream. This


35

generally involves the specification of those tokens in some form of regular expressions. Token extraction is also known as scanning or lexing (for lexical analysis). • BackusCNaur Form (BNF) Productions. A BNF production is a set of token sequences that has meaning based upon the desired syntax. For example, the string “2*3+4” can be abstractly interpreted as “INTEGER MULT INTEGER ADD INTEGER”. The second step in the parser construction is to group the tokens together to form the valid sequences for the desired syntax. JavaCC offers an excellent toolkit for generating parser classes in Java. JavaCC generates topdown, recursive descent parsers. The top-down nature of JavaCC allows it to be used with a wider variety of grammars than other traditional tools, such as yacc and lex. JavaCC also contains all parsing information in one file (the JavaCC grammar file). The convention is to name this file with a .jj extension. The Interpreter in Figure 4.1 is composed of a set of parser classes which are generated by JavaCC. It parsers the Java source code and outputs a set of raw data of the facts. These raw data sets are passed to the the Model Generator which builds source code models.

4.2.2

Source Code Models

JPackage +sequence [1..1] +name [1..1] : xsd:string

sequence +class [0..*] : xsd: string +interface [0..*] : xsd: string

Figure 4.2: The Meta-Model for Java Package Models.

Source code models represent Java packages, source files, classes, and methods defined in a class. We define four meta-models for source code models at different levels of abstraction :


36

JPackage, JFile, JClass, and JMethod. As designed, source code models are exported and stored as XML documents. Therefore, these meta-models are XML schemas and presented as UML models by applying the UML profile for XML schemas as discussed in Section 4.1.2. JPackage JPackage is the XML schema for modeling Java packages. Figure 4.2 illustrates the JPackage XML Schema in UML. JFile JFile is the XML schema for modeling Java source files. Figure 4.3 illustrates the JFile XML Schema in UML.

JFile +sequence [1..1] +javaSourceFile [1..1] : xsd:string +size [1..1] : xsd:positiveInteger

PublicType +choice [1..1]

sequence

NonPublicTypes

+publicType [0..1] : PublicType +nonPublicTypes [0..1] : NonPublicTypes

+npt_sequence [1..1]

choice

npt_sequence

+class [0..1] : xsd: string +interface [0..1] : xsd: string

+class [0..*] : xsd: string +interface [0..*] : xsd: string

Figure 4.3: The Meta-Model for Java Source File Models.

JClass JClass is the XML schema for modeling Java classes or interfaces. Figure 4.4 illustrates the JClass XML Schema in UML.


37

JClass

JClass::ImportedClasses

ic_sequence

+sequence [1..1] +name [1..1] : xsd:string +type [1..1] : xsd:string +size [1..1] : xsd:positiveInteger

+ic_sequence [1..1]

+class [0..*] : xsd:string +interface [0..*] : xsd:string

JClass::Modifiers

m_sequence

+m_sequence [1..1]

+modifier [0..*] : xsd:string

JClass::SuperClass

sc_sequence

+sc_sequence [1..1]

+class [0..1] : xsd:string

JClass::Interfaces

i_sequence

i_sequence [1..1] sequence +package [0..1] : xsd:string +importedClasses [0..1] +modifiers [0..1] +superClasses [0..1] +interfaces [0..1] +fields [0..1] +nestedClasses [0..1] +methods [0..1]

+interface [0..*] : xsd:string f_sequence

JClass::Fields f_sequence [1..1]

field [0..*]

JClass::Fields::Field +name [1..1] : xsd:string +type [1..1] : xsd:string +cardinality [1..1] : xsd:positiveInteger

JClass::NestedClasses nc_sequence [1..1]

nc_sequence jClass [0..*] cm_sequence

JClass::Methods cm_sequence [1..1]

+constructor [0..*] : JMethod +method [0..*] : JMethod

Figure 4.4: The Meta-Model for Java Classe/Interface Models.

JMethod JMethod is the XML schema for modeling Java methods defined in a class or constructors of a class. Figure 4.5 illustrates the JMethod XML Schema in UML.

CHAPTER 4. ARCHITECTURE RECOVERY JMethod::Modifiers

m_sequence

+m_sequence [1..1]

+modifier [0..*] : xsd:string

JMethod +sequence [1..1] +name [1..1] : xsd:string

38

JMethod::ReturnType +rt_sequence [1..1] JMethod::FormalParameters +fp_sequence [1..1] JMethod::ThrowedExceptions te_sequence [1..1] sequence +modifiers [0..1] +returnType [0..1] +formalParameters [0..1] +throwedExceptions [0..1] +catchedExceptions [0..1] +unstantiatedTypes [0..1]

JMethod::CatchedException ce_sequence [1..1] JMethod::InstantiatedTypes it_sequence [1..1]

rt_sequence +type [0..1] : xsd:string fp_sequence +type [0..*] : xsd:string te_sequence +type [0..*] : xsd:string ce_sequence +type [0..*] : xsd:string it_sequence +instantiatedType [0..*]

JMethod::InstantiatedTypes::InstantiatedType +type [1..1] : xsd:string +cardinality [1..1] : xsd:positiveInteger

Figure 4.5: The Meta-Model for Java Method/Constructor Models.

4.3

Modeling Architecture

In this thesis, the primary goal of architectural modeling is to establishing a repository of relationships among classes and interfaces which can easily be queried in the service identification stage. The relationships among classes and interfaces occur at different levels of abstraction such as package level, class level, and method level. In the specific context of our work, we analyze relationship at the class-level. Based on the source code models described in Section 4.2.2, we identify the relationship between the classes/interfaces and build two architectural models at


39

different levels of abstraction, namely Class/Interface Relationship Graph (CIRG) and Class/Interface Dependency Graph (CIDG). In addition to the CIRG and CIDG, reusability attributes for each class are computed and integrated into the graphs. The service identification and extraction tasks in the next stage are performed upon the transformation of these two graphs. The CIRG and CIDG are exported as XML documents and form the Structural View (SView) of the system [51].

4.3.1 Definitions of Class Relationships We aim to identify class/interface relationship at the class level. In order to comply with UML, the considered types of relationships between two classes (interfaces) in this thesis are inheritance, realization, association, aggregation, composition, and usage, which are adapted from UML 2.0 superstructure specification [65]. We try to formalize the relationships so that we can automatically detect them in implementation. In order to formalize class relationships at the implementation level, we extend the class relationship property set proposed in [41] : Generalization Property Given two classes, A and B, A may be a specialized form of B, or B may provide a contract that A agrees to carry out. We define the generalization property as follows : GE : Class × Class → G (4.1) where G = {null, extends, implements} Hence we have GE(A, B) ∈ {null, extends, implements}. GE(A, B) = extends if class A is a specialized form of class B; GE(A, B) = implements if B serves as the contract that A agrees to carry out; otherwise, GE(A, B) = null. Exclusivity Property An instance of class B involved at a given time in a relationship with an instance of class A can, or cannot, be in another relationship at the same time. We define


40

the exclusivity property as follows : EX : Class × Class → B (4.2) where B = {true, f alse} Given two classes, A and B, EX(A, B) ∈ {true, f alse}. Value true states that an instance of class B can take part in another relationship with another instance of class A or of another class. Value f alse indicates that it cannot. The exclusivity property only holds at a given time and it does not prevent possible transferals. Invocation-Site Property Instances of class A, involved in a relationship, send messages to instances of class B. We name all the set of all possible invocation sites : all ={f ield, arrayf ield, collectionf ield, parameter, arrayparameter, (4.3) collectionparameter, localvariable, localarray, localcollection} We distinguish three levels of invocation sites: fields, parameters, and local variables. Also, we distinguish ”simple” invocation sites, arrays, and collections because they imply different sets of programming idioms for their declarations and for their uses, which we need to individualize when detecting the relationships. We define the invocation-site property as follows : IS : Class × Class ⊆ all

(4.4)

Given two classes, A and B, IS(A, B) ⊆ all. Values of the IS property describe the invocation sites for messages sent from instances of class A to instances of class B. There can be no message sent from class A to class B, i.e., IS(A, B) = φ, or messages can be sent from A through a field (respectively a parameter, a local variable) of type B, an array field, or a field of type collection.


41

Lifetime Property Given two classes, A and B, the lifetime property constrains the lifetimes of all instances of class B with respect to the lifetimes of all instances of class A. We define the lifetime property as follows : LT : Class × Class →k (4.5) where k = {−, +} Hence we have LT (A, B) ∈ {−, +}. In programming languages with garbage collection, LT (A, B) = + if all instances of class B are destroyed before the corresponding instances of class A, and LT (A, B) = − if destroyed after. Also, LT (A, B) ∈k if the times of destruction of instances of classes A and B are unspecified. Multiplicity Property Given two classes, A and B, the multiplicity property specifies the number of instances of class B allowed in a relationship with class A. We express this property as follows : M U : Class × Class ⊂ N ∪ {+∞}

(4.6)

Hence we have M U (A, B) ⊂ N ∪ {+∞}. For the sake of simplicity, we use an interval of the minimum and maximum numbers to represent multiplicity. Also, we only consider multiplicity at the target end of a relationship. Once the class relationship properties are defined, we can formalize the considered binary class relationships at implementation level as six conjunctions of the above five properties. Formalizations of the binary class relationships are important because i) they provide formal languageindependent definitions of the relationships for understanding and communication among software engineers, and ii) they are the basis of the detection algorithms needed to bridge the gap between implementation and design [41].


42

Inheritance Relationship Given two classes, A and B, let A −→ B represent that there is an inheritance relationship between A and B, where A is the source class and B is the target class. The inheritance relationship signifies that class A shares the structure and behavior of class B and implies an ”is-a-kind of” relationship. We formalize the inheritance relationship as follows :

A −→ B = (GE(A, B) = extends) ∧ (GE(B, A) = null)

(4.7)

Realization Relationship Given two classes, A and B, let A −→ B represent that there is a realization relationship between A and B, where A is the source class and B is the target class. The realization relationship signifies that class A must realize, or implement, the behavior specified by the classes B (in Java case, B is an interface). We formalize the inheritance relationship as follows :

A −→ B = (GE(A, B) = implements) ∧ (GE(B, A) = null)

(4.8)

Association Relationship Given two classes, A and B, let A −→ B represent that there is an association relationship between A and B, where A is the source class and B is the target class. The UML specifies that the association represents the ability of one instance of the source class to send a message to an instance of the target class [65]. This is typically implemented with a pointer or reference instance variable, although it might also be implemented as a method parameter, or the creation of a local variable. We formalize the association relationship as follows :


43

A −→ B = (GE(A, B) = null) ∧ (GE(B, A) = null) ∧ (EX(A, B) ∈ B) ∧ (EX(B, A) ∈ B) ∧ (IS(A, B) = all) ∧ (IS(B, A) = φ) ∧

(4.9)

(LT (A, B) ∈ k) ∧ (LT (B, A) ∈ k) ∧ (M U (A, B) = [0, +∞]) ∧ (M U (B, A) = [0, +∞])

Aggregation Relationship Given two classes, A and B, let A −→ B represent that there is an aggregation relationship between A and B, where A is the source class and B is the target class. By the UML specification [65], the aggregation relationship is the typical whole/part relationship. That is, an instance of the target class (the part) is a part of an instance of the source class (the whole). The aggregation relationship implies a ”has a” relationship and is exactly the same as an association with the exception that instances cannot have cyclic aggregation relationships. We formalize the aggregation relationship as follows :

A −→ B = (GE(A, B) = null) ∧ (GE(B, A) = null) ∧ (EX(A, B) ∈ B) ∧ (EX(B, A) ∈ B) ∧ (IS(A, B) ⊆ {f ield, arrayf ield, collectionf ield}) ∧ (4.10) (IS(B, A) = φ) ∧ (LT (A, B) ∈ k) ∧ (LT (B, A) ∈ k) ∧ (M U (A, B) = [0, +∞]) ∧ (M U (B, A) = [1, +∞])

Composition Relationship Given two classes, A and B, let A −→ B represent that there is a composition relationship between A and B, where A is the source class and B is the target


44

class. Again, by the UML specification [65], the composition relationship is exactly like aggregation with the exception that the lifetime of the ’part’ is controlled by the ’whole’. This control may be direct or transitive. That is, the whole may take direct responsibility for creating or destroying the part, or it may accept an already created part, and later pass it on to some other whole that assumes responsibility for it. We formalize the aggregation relationship as follows :

A −→ B = (GE(A, B) = null) ∧ (GE(B, A) = null) ∧ (EX(A, B) = true) ∧ (EX(B, A) = f alse) ∧ (IS(A, B) ⊆ {f ield, arrayf ield, collectionf ield}) ∧ (4.11) (IS(B, A) = φ) ∧ (LT (A, B) = +) ∧ (LT (B, A) = −) ∧ (M U (A, B) = [1, +∞]) ∧ (M U (B, A) = [1, 1])

Usage Relationship Given two classes, A and B, let A −→ B represent that there is a usage relationship between A and B, where A is the source class and B is the target class. The UML specifies that a usage relationship is one in which the client (the source) requires the presence of the supplier (the target) for its correct functioning or implmentation [65]. Furthermore, the UML defines five types of usage relationships: i) the call relationship signifies that the source operation invokes the target operation, ii) the create relationship signifies that the source class creates one or more instances of the target class, iii) the instantiation relationship signifies that one or more methods belonging to instances of the source class create instances of the target class, iv) the responsibility relationship signifies that the client has some kind of obligation to the supplier, and v) the send relationship signifies that instances of the source class send signals to instances of the target class. We


45

formalize the usage relationship as follows :

A −→ B = (GE(A, B) = null) ∧ (GE(B, A) = null) ∧ (EX(A, B) ∈ B) ∧ (EX(B, A) ∈ B) ∧ (IS(A, B) ⊆ all − {f ield, arrayf ield, collectionf ield}) ∧ (4.12) (IS(B, A) = φ) ∧ (LT (A, B) ∈ k) ∧ (LT (B, A) ∈ k) ∧ (M U (A, B) = [1, +∞]) ∧ (M U (B, A) = [0, +∞])

4.3.2 Approach The architecture modeling process identifies all relationships between the classes/interfaces and represents the identified relationships in directed graphs. The process also computes the basic reusability attributes for each class in the system. Figure 4.6 illustrates the architecture modeling process.

Data Flow

Relationship Extractor Source Code Models (XML Doc)

CIDG

XML Parser

Graph Transformer

Metric Generator

CIRG

Graph Generator

Figure 4.6: The Approach for Architecture Modeling. As we described before, the source code models built by the source code modeling process are exported as XML documents. First, these source code models are parsed by the XML Parser in Figure 4.6. Then, the Relationship Extractor identifies all relationships described is Section 4.3.1


46

and the Metric Generator computes a set of metrics for each class/interfacce. We define a metric suite at the class level to represent the basic reusability attributes for each class in the system. The metric suite is presented in Table 4.1. The definition of each metric is adapted from SDMetrics [72]. Finally, the Graph Generator and Graph Transformer generate the CIRG and CIDG, respectively. We will give formal definitions of the CIRG and CIDG in following sections.

Metric lines code num attr

num ops

num pub ops

num nested classes setters

getters

fan in fan out

Definition The number of lines of non-comment code in a class. The number of attributes in a class. The metric counts all properties regardless of their type (data type, class or interface), visibility, changeability (read only or not), and owner scope (class-scope, i.e., static, or instance attribute). Not counted are inherited properties, and properties that are members of an association, i.e., that represent navigable association ends. The number of methods in a class. Includes all methods in the class that are explicitly modeled (overriding methods, constructors), regardless of their visibility, owner scope (class-scope, i.e., static), or whether they are abstract or not. Inherited operations are not counted. The number of public methods in a class. Same as metric num ops, but only counts operations with public visibility. Measures the size of the class in terms of its public interface. The number of inner classes in a class The number of operations with a name starting with ’set’. Note that this metric does not always yield accurate results. For example, an operation settleAccount will be counted as setter method. The number of operations with a name starting with ’get’, ’is’, or ’has’. Again, note that this metric does not always yield accurate results. For example, an operation isolateNode will be counted as getter method. The number of classes/interfaces that depend on this class. This metric counts incoming plain UML dependencies and usage dependencies. The number of classes/interfaces on which this class depends. This metric counts outgoing plain UML dependencies and usage dependencies. Table 4.1: The Metric Suite at Class Level


47

4.3.3 Class/Interface Relationship Graph The CIDG captures the UML-compliant relationships as explained in Section 4.3.1. The formal definition of the CIDG is given as follows : Definition 4.1. A Labeled Directed Graph (LDG) is a tuple Γ(V, E, LV , LE , lV , lE ), where V is a set of nodes (or vertices), E is a set of edges (or arcs), LV is a set of node labels, LE is a set of edge labels, lV : V → LV is a label function that maps nodes to node labels, and lE : E → LE is a label function that maps edges to edge labels. Definition 4.2. The Class/Interface Relationship Graph (CIRG) of an object-oriented system is an LDG defined in Definition 4.1, where V is the set of all classes/interfaces of the system, lV (v) returns the full name (i.e. package name concatenates class or interface name) of v for any v ∈ V , E = {(v, w) ∈ V × V | v references w}, and lE (e) returns the types of relationships between the source node and target node of e for any e ∈ E. The type of a relationship is one of IN , RE, AS, AG, CO, and U S, which represents inheritance, realization, association, aggregation, composition, and usage, respectively. Each class or interface of a Java system represents a node of the CIRG of the system. We name the node in the CIRG as RClass, and each node is presented and exported as an XML document. The XML schema for each node is depicted in Figure 4.7. The XML schema shows that four types of information about the CIRG node are captured : • Property The property field records the name, the type (i.e., class or interface), the package name, and the Java source file name of the corresponding class or interface. • Characteristics The characteristics field records the accessibility (i.e., public, protect, or private) and the implementation status (i.e., concrete class or abstract class) of the corresponding class or interface.


48

• Metrics The metrics field records values of the metrics in Table 4.1 for the corresponding class or interface. • Relationships The relationships field records all classes or interfaces which have one of the defined relationships with the corresponding class or interface. The type and the direction of the relationship are also stored. RClass +sequence [1..1]

p_all

Property +p_all [1..1]

+name [1..1] : xsd:string +type [1..1] : xsd:string +package [1..1] : xsd:string +sourceFile [1..1] : xsd:string

Characteristics +c_all [1..1]

c_all +accessibility [1..1] : xsd:string +implementation [1..1] : xsd:string

sequence

m_all

+property [0..1] : Property +characteristics [1..1] : Characteristics +metrics [0..1] : Metrics +relationships [0..1] : Relationships Metrics

+lines_code [1..1] : xsd:positiveInteger +num_attr [1..1] : xsd:positiveInteger +num_pub_ops [1..1] : xsd:positiveInteger +num_ops [1..1] : xsd:positiveInteger +num_nested_classes [1..1] : xsd:positiveInteger +setters [1..1] : xsd:positiveInteger +getters [1..1] : xsd:positiveInteger

+m_all [1..1]

Relationships r_all[0..1]

i_sequence +class [0..*] : xsd:string +interface [0..*] : xsd:string o_sequence +class [0..*] : xsd:string +interface [0..*] : xsd:string

r_all +inheritance [0..1] : Direction +realization [0..1] : Direction +association [0..1] : Direction +aggregation [0..1] : Direction +composition [0..1] : Direction +usage [0..1] : Direction

Direction::In i_sequence[1..1]

Direction::Out o_sequence[1..1]

Direction d_sequence[1..1]

d_sequence +in [0..1] +out [0..1]

Figure 4.7: The UML Representation of XML Schema for Nodes in the CIRG.


49

4.3.4 Class/Interface Dependency Graph Class dependencies occur when one class uses the services of another class. For example, this can happen when a class inherits from another, has an attribute whose type is of another class, or when one of its methods calls a method on an object of another class. Given two classes, v and w, let v ³ w represent that class v depends upon class w. We formalize the class dependency as follows :

v ³ w = v −→ w ∨

v −→ w ∨

v −→ w ∨

(4.13)

v −→ w ∨

v −→ w ∨

v −→ w Now, we are ready to give the formal definition of the CIDG of an object-oriented system: Definition 4.3. The Class/Interface Dependency Graph (CIDG) of an object-oriented system is an LDG defined in Definition 4.1, where V is the set of all classes/interfaces of the system, lV (v) returns the full name (i.e. package name concatenates class or interface name) of v for any v ∈ V , E = {(v, w) ∈ V × V | v ³ w}, LE = φ, and hence lE (e) returns an empty label for any e ∈ E. Again, each class or interface of a Java system represents a node of the CIDG of the system. We name the node in the CIDG as DClass, and each node is presented and exported as an XML document. The XML schema for each node is depicted in Figure 4.8. The XML schema shows that four types of information about the CIDG node are captured : • Property The property field records the name, the type (i.e., class or interface), the package name, and the Java source file name of the corresponding class or interface.


50

• Characteristics The characteristics field records the accessibility (i.e., public, protect, or private) and the implementation status (i.e., concrete class or abstract class) of the corresponding class or interface. • Metrics The metrics field records values of the metrics in Table 4.1 for the corresponding class or interface. • Dependency The dependency field records all classes or interfaces on which the corresponding class or interface depends, and all classes or interfaces that depend on the corresponding class or interface. DClass +sequence [1..1]

p_all

Property +p_all [1..1]

+name [1..1] : xsd:string +type [1..1] : xsd:string +package [1..1] : xsd:string +sourceFile [1..1] : xsd:string

Characteristics +c_all [1..1]

c_all +accessibility [1..1] : xsd:string +implementation [1..1] : xsd:string

sequence

m_all

+property [0..1] : Property +characteristics [1..1] : Characteristics +metrics [0..1] : Metrics +relationships [0..1] : Dependency Metrics +m_all [1..1]

Dependency d_all[0..1]

+lines_code [1..1] : xsd:positiveInteger +num_attr [1..1] : xsd:positiveInteger +num_pub_ops [1..1] : xsd:positiveInteger +num_ops [1..1] : xsd:positiveInteger +num_nested_classes [1..1] : xsd:positiveInteger +setters [1..1] : xsd:positiveInteger +getters [1..1] : xsd:positiveInteger +fan_in [1..1] : xsd:positiveInteger +fan_out [1..1] : xsd:positiveInteger

d_all +in [0..1] : Types +out [0..1] : Types

t_sequence +class [0..*] : xsd:string +interface [0..*] : xsd:string

Types t_sequence[1..1]

Figure 4.8: The UML Representation of XML Schema for Nodes in the CIDG.


51

4.3.5 An Example : Car Rental System In order to clarify the definitions and algorithms proposed in this thesis, we will give examples of a hypothetical software system on appropriate places. The hypothetical software system is a Car Rental System (CRS) which consists of agents, customers, and a vehicle repository. The CRS provides two main business services: i) booking cars, and ii) evaluating cars based on the driving records of the customers. Figure 4.9 shows the CIRG of the CRS system. The CIRG captures all class relationships defined in Section 4.3.1 of the CRS system.

com.uwstar.crs.training TrainingPlan

com.uwstar.crs.person Dealer

com.uwstar.crs.record Record

com.uwstar.crs.training TrainingCourse

com.uwstar.crs.person Person

com.uwstar.crs.person Agent

com.uwstar.crs.record CreditRecord

com.uwstar.crs.person Customer

com.uwstar.crs.record DrivingRecord

com.uwstar.crs VehicleEvaluation

com.uwstar.crs Booking

com.uwstar.crs.vehicle Car

com.uwstar.crs VehicleRepository

com.uwstar.crs IBooking

com.uwstar.crs.vehicle SUV

com.uwstar.crs.vehicle Truck

com.uwstar.crs.vehicle Vehicle

Figure 4.9: The CIRG of the Car Rental System (CRS).


52

Figure 4.10 shows the CIDG of the CRS system. Each node represents a class/interface of the CRS system, and an edge between two classes/interfaces represents a dependency existing between these two classes/interfaces. By their definitions, the CIRG is a UML-compliant model, and the CIDG is a further abstraction of the CIRG. That is, the CIRG and CIDG model the structure of an object-oriented software system at different levels of abstraction.


com.uwstar.crs.training TrainingCourse


com.uwstar.crs.person Dealer









com.uwstar.crs.vehicle Car


com.uwstar.crs.vehicle SUV

com.uwstar.crs.vehicle Truck


Figure 4.10: The CIDG of the Car Rental System (CRS).


4.4

53

Summary

We have discussed the source code modeling process and the architecture modeling process that are contained in the architecture recovery stage of the SOC4J framework. The source code modeling process builds a complete data model set for Java source code at different abstracted levels. Based on the data models, the architecture modeling process establishes a repository of relationships among classes and interfaces which can easily be queried in the next stage of the SOC4J framework.

Chapter 5

Service Identification An effective way of leveraging the value of legacy systems is to expose their functionalities as services to a larger number of clients. Identifying critical business services embedded in an existing Java system is one of the primary tasks of the proposed SOC4J framework. This is done in the service identification process of the SOC4J framework. This process is based on the analysis on the recovered architectural information obtained from the previous chapter. This chapter discusses the service identification strategy and algorithms that are used to identify critical business services embedded in an existing object-oriented system. In Section 5.1, we discuss how a service is described and modeled. We introduce the supporting techniques used in the service identification process in Section 5.2. The service identification process is presented in Section 5.3. Finally, we give a summary of this chapter in Section 5.4.

5.1

Service Representations

A business service within a software system is an abstract resource that represents a capability of performing tasks that represent a coherent functionality from the points of view of both the provider and the requester [40]. We categorize services that are embedded in an object-oriented

54

CHAPTER 5. SERVICE IDENTIFICATION

55

system into two categories : • Top-Level Services (TLS) A top-level service is a service that is not used by any other services of the system. However, it may contain a hierarchy of low-level services that further describe the service. From the requester’s point of view, top-level services are services provided by the system that can be accessed independently. Top-level services are hence independent from each other. • Low-Level Services (LLS) A low-level service is a service that is underneath a top-level service, which may be agglomerated with other low-level services underneath the same top-level service to yield a new service with higher level of granularity (i.e., the desired business result). The SOC4J framework is designed to identify both the top-level services and the low-level services embedded in an existing object-oriented system. In order to clearly describe and automate the identification process, we describe an identified service (either a top-level service or a low-level service) as a tuple : (name, CF , SHG) In the above tuple, name is the name of the service. CF is the facade class set of the service. The facade class set contains classes/interfaces that directly provide the functionality of the service to the outside world. SHG is the Service Hierarchy Graph (SHG) of the top-level service represented by the tuple. The SHG is defined as follows : Definition 5.1. The Service Hierarchy Graph (SHG) associated with a top-level service is a rooted LDG, where the root, r ∈ V , represents the top-level service, V \ r represents the set of low-level services contained in the top-level service, lV (v) returns the CF set of v for any v ∈ V , E = {(v, w) ∈ V × V | v contains w}, LE = φ, and hence lE (e) returns an empty label for any e ∈ E.


56

The SHG shows the structural relationships between the services underneath a top-level service. It gives a high-level representation of services that is understandable by both developers and business experts. Furthermore, the SHG describes the modularization of its corresponding top-level service. There is no SHG associated with a low-level service, that is to say : SHG = φ for a low-level service. This is because each low-level service has already been presented in the SHG of its top-level service. The SHGs of all top-level services of an object-oriented software system form the service view (ServView) of the system. The identified services (represented as tuples) are exported and stored as XML documents. The XML schema for services is illustrated in Figure 5.1.

sequence +name [1..1] : xsd:string +serviceHierarchyGraph [0..1] +facadeClassSet [1..1] Service::FacadeClassSet +fc_sequence [1..1]

Service::ServiceHierarchyGraph +shg_sequence [1..1]

Service +sequence [1..1]

fc_sequence +class [0..1] : xsd: string +interface [0..1] : xsd: string shg_sequence +name [1..1] : xsd: string

Figure 5.1: The UML Representation of XML Schema for a Service.

5.2

Supporting Concepts

The proposed service identification approach involves a set of techniques such as graph transformations, dominance analysis on directed graphs, and evaluation of the modularization of a system that is represented by directed graphs. It is helpful to introduce these techniques prior to explaining the service identification process.


57

5.2.1 Graph Techniques Graphs can be used to describe complex object structures in a mathematical way. In the context of software engineering, we can use graphs to formalize object-oriented languages and concepts, especially, the UML. In this thesis, we apply graph techniques to assist in service identification. The important graph concepts and techniques involved in this thesis are reviewed as follows : Definition 5.2. Let G = (V, E) be a directed graph (DG), where V represents all nodes (or vertices) in G and E represents all edges (or arcs) in G. Given a node v ∈ V , the in-degree of v is the number of inward directed edges from v and the out-degree of v is the number of outward directed edges from v. A root of G is a node whose in-degree is zero. G is said to be a rooted directed graph iff there is only one root in V . Definition 5.3. Let G = (V, E) be a DG, where V represents all nodes (or vertices) in G and E represents all edges (or arcs) in G. Given two nodes v ∈ V and w ∈ V , a path from vertex v to vertex w is a sequence of consecutive edges between v and w. A cycle is a path from a node to the same node. Node w is said to be reachable from node v if there is a path from v to w. G is a directed acyclic graph (DAG) iff there is no cycle in G. Definition 5.4. A rooted tree is a DG G=(V,E), where V represents all nodes (or vertices) in G and E represents all edges (or arcs) in G, such that 1. there is a unique node in V (called the root) which has in-degree 0; 2. every node in V except the root has in-degree 1; and 3. there is a path from the root to every other node in G. Definition 5.5. Let G = (V, E) be a DG, where V represents all nodes (or vertices) in G and E represents all edges (or arcs) in G. G is connected if the underlying undirected graph of G is connected. While G is strongly connected if there is a path in G between every pair of nodes in V.


58

Definition 5.6. Let G = (V, E) be a DG, where V represents all nodes (or vertices) in G and E represents all edges (or arcs) in G. A connected component of G is a maximal (though not necessarily maximum) connected subgraph of G. A strongly connected component of G is a maximal (though not necessarily maximum) strongly connected subgraph of G. A rooted component is a subgraph of G that consists of a unique root and the collection of all nodes w such that there is a path from the root to w. Definition 5.7. Let G = (V, E) be a DG, where V represents all nodes (or vertices) in G and E represents all edges (or arcs) in G. A clique in G is a collection of nodes in V such that each pair of nodes in the collection is joined by an edge. A k-clique is a clique that the number of nodes in the clique is k.

Figure 5.2: An Example of a Directed Graph. For example, given the directed graph G in Figure 5.2, there are two connected components˙: graphs (a) and (b) in Figure 5.3. The only strongly connected component of G is the graph (c) in Figure 5.3. Note that the subgraph {2, 5, 7} or {5, 6, 7} is not a strongly connected component of G, because they are not maximal. Graph (d) and graph (e) in Figure 5.3 are two rooted components of graph (a) in Figure 5.3. The set {2, 3, 7} is a 3-clique in graph G.


59

(c)

(b)

(a)

(c)

(d)

Figure 5.3: (a) A connected component of the directed graph G in Figure 5.2. (b) The other connected component of G. (c) The only strongly connected component of G. (d) A rooted component of graph (a). (e) The other rooted component of graph (a).

5.2.2 Dominance Analysis Dominance analysis is a fundamental concept in compiler optimizations and has been used extensively to identify loops in basic block graphs [61]. It allows one to locate subordinated software elements in a rooted dependency graph. Dominance analysis on call graphs of procedural language applications has been used in reverse engineering to identify modules and subsystems and recover system architectures [17, 26, 36]. In this thesis, we explore the use of dominance analysis on SHGs. This assists us in identifying low-level services underneath a top-level service. Dominance is a relation between nodes in a rooted directed graph. This relation can be formally defined as follows :


60

Definition 5.8. Let G = (V, E, r) be a rooted directed graph, where V represents all nodes in G, E represents all edges in G, and r ∈ V is the unique root node of G. Given any two different nodes v ∈ V and w ∈ V , node v dominates node w, written v dom w, iff every path from root r to w contains v. Node v directly dominates node w, written v ddom w, iff all the nodes that dominate w dominate v. Node v strongly directly dominates node w, written v sddom w, iff v ddom w and v is the predecessor of w. Definition 5.9. Let G = (V, E, r) be a rooted directed graph, where V represents all nodes in G, E represents all edges in G, and r ∈ V is the unique root node of G. The dominance tree corresponding to G is a tree T = (V, Ed , r) where Ed = {(v, w) ∈ V ×V | v ddom w ∨ v sddom w}. A ddom subtree of T is a subtree that the root of the subtree has ddom incoming edge. A sddom subtree of T is a subtree that the root of the subtree has sddom incoming edge. A consolidation subtree of the dominance tree is a subtree that contains only sddom edges. A maximal consolidation subtree is a maximal subtree that contains only sddom edges.

ddom sddom (a)

(b)

(c)

Figure 5.4: (a) A Simple Directed Graph. (b) The Dominance Tree Corresponding to the Graph in (a). (c) All Two Maximal Consolidation Subtrees of the Dominance Tree in (b). Figure 5.4 shows a simple rooted directed graph, the corresponding dominance tree, and the maximal consolidation subtrees in the dominance tree. Note that the subtree {6, 9} is a ddom


61

subtree and {2, 4, 5, 8} is a sddom subtree. The subtree {7, 10} is a consolidation subtree but not a maximal consolidation subtree, because it is not a maximal subtree that contains only sddom edges. In Figure 5.4, the dominance tree is constructed from an acyclic graph. However, this is not a necessary condition. We can construct a dominance tree from every directed graph as long as it is rooted. By Definition 5.8 and 5.9, we can observe the following properties of dominance trees : Property 5.1. Given a rooted directed graph G = (V, E, r), where V represents all nodes in G, E represents all edges in G, and r ∈ V is the unique root node of G. Let T be the dominance tree corresponding to G. For each node (except the root) in a subtrees (either ddom subtree or sddom subtree) of T , there is no incoming edge in E from any other nodes which are outside the subtree. Property 5.2. Given a rooted directed graph G = (V, E, r), where V represents all nodes in G, E represents all edges in G, and r ∈ V is the unique root node of G. Let T be the dominance tree corresponding to G. For each node (except the root) in a consolidation subtrees of T , there is no incoming edge in E from any other nodes (either inside the subtree or outside the subtree) except its parent in T . In the analysis process of reverse engineering, it is essential to have an effective way of abstracting information. The dominance tree provides such an abstraction. More importantly, it represents high-level modularization of the software system through its branches. Each branch of the dominance tree represents a concept or high level functionality of the system. In the context of object-oriented design, one benefit of using dominance trees in program comprehension is the reduction of the visualization complexity of the class dependency graph by decreasing a large number of edges. In the class dependency graph of a real world software system, a class may have been referenced by hundreds of classes and a reduction to a single edge on the dominance tree greatly clarifies the graphic.


62

5.2.3 Modularization Quality Metric The modularization quality (MQ) metric was first introduced in [54]. It has been used in a number of software engineering projects to evaluate the quality of software modularization achieved by graph partitioning [24, 76]. Basically, the MQ metric measures the difference between the average inter-connectivity and intra-connectivity of a system and shows how well the system is structured. In this thesis, we use the MQ metric to evaluate how well a top-level service is modularized by its low-level services. Let C(G1, G2, ..., Gk ) be a partition of a given graph G(V, E), where V represents all nodes in G and E represents all edges in G. The MQ metric of the system, which is represented by the graph G, is defined as follows : Pk−1 Pk

Pk M Q(C, G) =

i=1 s(Gi , Gi )

n

−

i=1

j=i+1 s(Gi , Gj )

n(n − 1)/2

(5.1)

The function s() used in Formula (5.1) is defined as the ratio of the actual number of edges between two subsets of V of graph G with respect to the maximum number of possible edges between those two sets. Let U and W be two subsets of V (i.e., U ⊆ V and W ⊆ V ), then we have

s(U, W ) =

e(U, W ) |U ||W |

(5.2)

where e(U, W ) denotes the number of edges connecting a vertex in U to a vertex in W . The MQ metric determines the quality of the modularization quantitatively as the trade-off between inter-connectivity and intra-connectivity of subsystems. This trade-off is based on the assumption that well-designed software systems are organized into cohesive subsystems that are loosely interconnected. Hence, the MQ metric is designed to reward the creation of highly cohesive clusters, and to penalize excessive coupling between clusters. The value of the MQ metric is


63

between −1 (no internal cohesion) and 1 (no external coupling). A straightforward consequence is that a higher M Q value can be interpreted as better modularization since it corresponds to a partition with either fewer edges connecting vertices from distinct blocks, or with more edges lying within the identical blocks of the partitions, which is what most clustering or modularization algorithms aim to achieve [24].

5.3

The Proposed Processes

In the SOC4J framework, we aim to identify critical business services embedded in an existing Java system. Our service identification process, as shown in Figure 5.5, is supported by a combination of top-down and bottom-up techniques.

Top-Level Service Identification

From Stage 1 CIDG Transformatiom MCIDGs

Top-Level Service Candidate Generation Top-level service candidates

Service Validation Validated top-level services and their atomic services (described in SHGs)

Service Aggregation

SHG

Dominance Tree Generation DTree of SHG

Dominance Tree Reduction Reduced DTree

SHG Reconstruction

Low-Level Service Identification

SHG Transformation

SHG

No

Termination Criteria Satisfied?

Yes

Control Flow

To Stage 3

Process

Figure 5.5: Processes in Service Identification Stage.


64

In the top-down portion of the process, we identify the top-level services and the atomic services (to be discussed later) underneath each top-level service. In the bottom-up portion, we aggregate the atomic services to identify services with higher level of granularity (reusable services). We will delve into these two portions in the subsequent two sections.

5.3.1 Top-Level Service Identification The top-level service identification process is the top-down portion of the proposed service identification process. According to the definition of a top-level service ( introduced in Section 5.1), top-level services of a software system partition the system into independent parts. Each of these independent part represents a service to the outside world, from the user’s point of view. We identify services of a system by starting with its top-level services, and then extracting a service hierarchy for each top-level service to identify low-level services underneath each top-level service.

Algorithm 5.1: CIDG-Transformation Input: CIDG : The CIDG of the system. Output: M CIDGs : A set of MCIDGs.

1 2

3 4 5 6

// decompose the CIDG into connected components M CIDGs ← φ; CGraphs ← ConnectedComponents(CIDG); // decompose each connected component into a set of rooted // components foreach graph g ∈ CGraphs do RGraphs ← RootedComponents(g); M CIDGs ← M CIDGs ∪ RGraphs; end

To identify the top-level services of an existing object-oriented system, the first step is to identify the entry points of the system. In Chapter 4, we have modeled the existing system as


65

directed graphs : the class/interface relationship graph (CIRG) and the class/interface dependency graph (CIDG). At this stage, we decompose the CIDG into a set of connected components with an unique root such that each component is an independent subgraph of the CIDG. Algorithm 5.1 describes the decomposition process. In Algorithm 5.1, function ConnectedComponents() computes and returns all connected components of the given directed graph. While function RootedComponents() decomposes a connected directed graph into a set of rooted components. We name each of the rooted components as a modularized CIDG (MCIDG). Essentially, Algorithm 5.1 applies a set of graph transformation rules to transform the CIDG into a set of rooted components (i.e., MCIDGs). Note that the output MCIDGs are subgraphs of the CIDG, and each node in an MCIDG represents a single class or interface of the system. There is no other class or interface in the system depends upon the unique root of each MCIDG. Consequently, the unique root of each MCIDG might represents an entry point of the system, and each MCIDG might therefore embed a top-level service represented by its root. As we have mentioned, each node of an MCIDG contains only one class or interface. At this stage, we consider the root of each MCIDG as a top-level service candidate and the other nodes as the low-level service candidates underneath the top-level service candidate. The second step of the top-level service identification is to generate the top-level service candidates from MCIDGs. This is achieved by performing three tasks for each top-level service candidate represented by an MCIDG : i) to compute the facade class set, ii) to build the SHG of the top-level service candidate, and iii) to describe the candidate as the tuple that we have defined in Section 5.1. The final step of the top-level service identification is to validate the top-level service candidate and assign a meaningful name for each accepted top-level service. This is a user-involved procedure. The user retrieves the functionality provided by the candidate through examining the classes/interfaces in its facade class set. Based on the functionality, the user can make a decision on the candidate.


Algorithm 5.2: Top-Level Service Identification Input: CIDG : The CIDG of the system. Output: T LSs : A set of identified top-level services that are represented by (name, CF , SHG) tuples.

1

2 3 4 5 6 7

8 9 10 11 12 13 14 15

16 17 18 19 20 21 22 23

// decompose the CIDG into a set of rooted components // each rooted component is an MCIDG M CIDGs ← Run CIDG-Transformation Alg. on CIDG; // generate top-level service candidates // represent candidates as (name, CF , SHG) tuples Candidates ← φ; foreach M CIDG(Vm , Em ) ∈ M CIDGs do Create a new graph G(V, E); V ← φ; E ← Em ; for i ← 1 to |Vm | do // Vm (i) means the ith node in Vm V (i) ← F acade(Vm (i), M CIDG, CIDG); end Create a new tuple T (name, CF , SHG); T.name ← null; T.CF ← Root(G); T.SHG ← G; Add tuple T (name, CF , SHG) to Candidates; end // validate the top-level service candidates // assign a meaningful name for each accepted service T LSs ← φ; foreach tuple T ∈ Candidates do The user validates the candidate by examining T.CF ; if T is acceptable then T.name ← An meaningful name for the service; Add T (name, CF , SHG) to T LSs; end end

66


67

Algorithm 5.2 describes the details of these three steps in the top-level service identification process. In Algorithm 5.2, each iteration of the f or loop on line 3 transforms an MCIDG into a top-level service candidate. Function F acade() computes and returns facade class sets for a given top-level service candidate and its low-level service candidates. As we have described, the facade class set contains classes/interfaces that describe the functionality of the service to the outside world. Therefore, function F acade() returns a set of classes/interfaces that have incoming edges from classes/interfaces in the CIDG but not in the MCIDG. Function Root() returns the root of a given directed graph. The user validates a candidate by examining its facade class set since classes in the set represent the functionality of the service. At this stage, the SHG corresponding to each top-level service is built from the MCIDG and therefore can be viewed as a subgraph of the CIDG. In other words, the SHG is a abstraction of a MCIDG hiding the non-necessary information for understanding the service hierarchy. The functionality of low-level services in the hierarchy is provided by a single class. Hence these services are called atomic services. In most cases, these atomic services are too fine-grained and have little reusability. However, the SHG at this stage provides us a good starting point to identify services with a higher level of granularity by using service aggregation techniques that are presented in the subsequent section. After performing the top-level service identification, the critical top-level services of an existing system have been identified. Moreover, for each top-level service, we have extracted a service hierarchy graph (SHG) to model its low-level services. However, at this time, low-level services in the SHG are atomic services with little or no reusability. We need to build a new SHG for each top-level service that contains low-level services with a higher level of granularity. Consequently, these low-services in the new SHG are critical business services and have better reusability. This is achieved at the low-level service identification process.


68

5.3.2 Low-Level Service Identification The low-level service identification process is the bottom-up portion of the entire service identification process. SHGs built in the top-level service identification process are rooted directed graphs that represent the structural dependency between a top-level service and its low-level services (atomic services). As we have mentioned, these atomic services are too fine-grained and therefore have limited reusability. At this stage, we aim to aggregate highly related atomic services to build a new SHG for each top-level service such that the services contained in the new SHG have a higher level of granularity and thus present a higher potential for reuse. The service aggregation is an iterative process and the desired new SHG is achieved incrementally. The low-level services obtained from each iteration have higher level of granularity than the previous iteration and hence modularize the top-level service in a different way. The result services of each iteration are presented as an intermediate SHG to users. An evaluation procedure can be performed at each iteration to determine whether specific goals have been reached. Then users can make a decision on repeating or terminating the process according the pre-defined termination criteria. Algorithm 5.3 describes the low-level service identification process for a given top-level service. Essentially, it repeatedly runs the service aggregation algorithm (i.e., Algorithm 5.4) on low-level services underneath a top-level service until the Termination Criteria are satisfied. Once the iteration is terminated, the final SHG is built for the top-level service. Then, the algorithm represents the low-services contained in the newly built SHG in tuples defined in Section 5.1. Function M Q() computes the MQ metric of a given top-level service. The MQ metric quantitatively measures the quality of the modularization of a top-level service as the trade-off between inter-connectivity and intra-connectivity of its low-level services. Based on the modularization of the top-level service and the level of granularity of the lowlevel services underneath the top-level service, we define two Termination Criteria to stop the


69

Algorithm 5.3: Low-Level Service Identification Input: CIRG : The CIRG of the system, CIDG : The CIDG of the system, T (name, CF , SHG) : The top-level service. Output: LLSs : Identified low-level services represented in (name, CF , SHG) tuples. T (name, CF , SHG) : The input top-level service with newly built SHG.

1

// compute the MQ metric of the input top-level service Compute M Q(T.SHG, CIDG);

2

// aggregate low-level services iteratively repeat

3 4 5

SHGnew ← Run Service Aggregation Alg. on T.SHG; T.SHG ← SHGnew ; Compute M Q(T.SHG, CIDG);

6

until Termination Criteria are satisfied ;

7

// represent identified Low-level services in tuples LLSs ← φ; foreach non-root node v ∈ T.SHG do

8 9 10 11 12 13 14

Create a new tuple L(name, CF , SHG); L.name ← Meaningful name for the service; L.CF ← lV (v); L.SHG ← φ; Add L(name, CF , SHG) to LLSs; end

service aggregation iteration in Algorithm 5.3 : Termination Criterion 5.1. The top-level service has been nicely modularized by its low-level services. Termination Criterion 5.2. Low-level services are presenting appropriate level of granularity. In term of the structure of a top-level service, the low-level services underneath the top-level service modularize the top-level service. By the definition of the MQ metric, the higher the value of the MQ metric of a top-level service is, the better structure the service has. This is based on


70

the hypothesis that a well-modularized service becomes highly malleable; that is, the service can evolve in less time and at less cost. On the other hand, the level of granularity of services must be matched to the level of reusability and flexibility required for a given context. The basis of the second criterion is the hypothesis that the component that realizes a service with higher level of granularity has better reusability.

Algorithm 5.4: Service-Aggregation Input: CIRG : The CIRG of the system, CIDG : The CIDG of the system, SHG : The SHG that contains the low-level services to be aggregated, Heuristic1 : Termination Criterion 1, Heuristic2 : Termination Criterion 2. Output: SHGnew : A new SHG that contains low-level services with higher level of granularity.

2

// SHG transformation SHGnew ← CollapseCliques(SHG, CIRG, CIDG); SHGnew ← CollapseStronglyConnectedComponents(SHGnew );

3

// dominance tree generation DT ree ← GenerateDominanceT ree(SHGnew );

1

5

// dominance tree reduction ReduceDominanceT ree(DT ree, Heuristic1); ReduceDominanceT ree(DT ree, Heuristic2);

6

// SHG reconstruction SHGnew ← ReconstructSHG(DT ree, CIDG);

4

Algorithm 5.4 aggregates highly related low-level services into a single service with a higher level of granularity and reconstructs a new SHG containing these newly identified services. The output SHG contains fewer low-level services with a higher level of granularity than the input SHG. In order words, it modularizes the corresponding top-level service in a better way. The service aggregation is based on the dominance analysis on SHGs. As we have explained,


71

SHGs are rooted directed graphs, hence we can generate dominance trees from SHGs. However, in order to improve the shape of the generated dominance tree (increase the height of the tree), we perform a graph transformation on SHGs. The purpose of the graph transformation is to agglomerate strongly related services and remove cycles in SHGs. Program units linked by recursion contribute to the implementation of a single functionality and can, therefore, be regarded as a single module. We remove cycles in SHGs by aggregating the services within a cycle into a single service. Where many services are involved within a cycle, poorer results of the dominance tree analysis are generally obtained [17, 36]. Our empirical studies in Chapter 7 shows that collapsing strongly related services and removing cycles in SHGs are essential to dominance analysis on SHGs. In Algorithm 5.4, function CollapseCliques() collapses the services in a 3-clique in the input SHG if the similarity of services in the clique exceeds a user-defined threshold. We have developed a methodology for computing the similarity between two services, based on the coupling analysis of the classes that implements these services [52]. Function CollapseStronglyConnectedComponents() iteratively detects the strongly connected components (described in Section 5.2.1) in a directed graph and then collapses all nodes in the component into one node and updates the edges accordingly until there is no strongly connected component left. Consequently, the output graph of this function is a directed acyclic graph (DAG). The output SHG of the SHG transformation contains no cycle. Once the SHG transformation is done, function GenerateDominanceT ree() generates the service dominance tree from the new SHG. Function ReduceDominanceT ree() reduces a dominance tree by applying a given reducing heuristic. We define two reducing heuristics as follows: Heuristic 5.1. Remove each maximal consolidation subtree by only keeping the root node of the subtree. Agglomerating all services that are parts of a maximal consolidation subtree into a service makes


72

sense because these services constitute an independent unit that can only be accessed by the rest of services of the system through the root of the subtree. In order to simplify the visualization, we only need to present the root because the rest of the subtree is only visible to the root and can be hidden in the root. Heuristic 5.2. Remove all leaf nodes in a subtree that contain both ddom and sddom edges, which are linked to the root of the subtree by sddom edges. These leaf nodes represent low level services that are only accessible to the service represented by the root of the subtree. Therefore these low level services can be considered as subservices of the root. Function ReconstructSHG() recovers the service hierarchy for the services presented in a service dominance tree. It needs the CIDG to provide extra information since the service dominance tree is an abstraction of a service hierarchy graph with some information lost. After performing the low-level service identification for each identified top-level services from an existing object-oriented system, critical low-level services underneath each top-level service have been identified. Finally, the SHGs of all top-level services yield the ServView of the system.

5.3.3

An Example : Car Rental System

To further explain the proposed service identification processes, in this section, we identify the business services embedded in the CRS example by applying the algorithms introduced in the service identification processes. First of all, we identify the top-level services of the CRS system by running Algorithm 5.2 on the CIDG of the CRS system, which is depicted in Figure 4.10. Algorithm 5.1 decomposes the CIDG into rooted components (i.e., MCIDGs). Figure 5.6 depicts the result MCIDGs. There are three MICDGs generated from the CRS system : graph (a), (b), and (c) in Figure 5.6.


73


com.uwstar.crs VehicleRepository com.uwstar.crs.person Agent


com.uwstar.crs.vehicle SUV com.uwstar.crs.vehicle Truck


com.uwstar.crs.vehicle Car com.uwstar.crs.training TrainingCourse

com.uwstar.crs.person Person com.uwstar.crs.record DrivingRecord com.uwstar.crs.record Record




(a)





com.uwstar.crs.record DrivingRecord com.uwstar.crs.person Dealer


(b)

Figure 5.6: The MCIDGs of the Car Rental System.

(c)


74

Based on the MCIDGs extracted by Algorithm 5.1, Algorithm 5.2 generates the following top-level service candidates (TLSC) : • TLSC1 : (null, {com.uwstar.crsBooking}, SHG1). • TLSC2 : (null, {com.uwstar.crsV ehicleEvaluation}, SHG2). • TLSC3 : (null, {com.uwstar.crs.personDealer}, SHG3).

Low-level services Low-level services underneath the top-level service


com.uwstar.crs VehicleRepository com.uwstar.crs.person Agent


com.uwstar.crs.vehicle SUV com.uwstar.crs.vehicle Truck


com.uwstar.crs.vehicle Car com.uwstar.crs.training TrainingCourse

com.uwstar.crs.person Person com.uwstar.crs.record DrivingRecord com.uwstar.crs.record Record




Figure 5.7: The SHG of the Top-Level Service V ehicleBooking. SHG1, SHG2, and SHG3 are graphs (a), (b), and (c) in Figure 5.6, respectively. By examining the functionality of each top-level service candidate, we find that the candidate (null, {com.uwstar.crs.personDealer}, SHG3) is not a critical business service. The class com.uwstar.crs.personDealer is a dead class. Hence, after the service validation, we accept two top-level services (TLS) of the CRS system :


75

• TLS1 : (V ehicleBooking, {com.uwstar.crsBooking}, SHG1). • TLS2 : (V ehicleEvaluation, {com.uwstar.crsV ehicleEvaluation}, SHG2). After running Algorithm 5.2, the critical top-level services of the CRS system are identified. Moreover, for each top-level service, we extract a service hierarchy graph (SHG) to model its lowlevel services. Figure 5.7 illustrates the SHG of the identified top-level service V ehicleBooking. At this stage, a low-level service in the SHG is a single class (atomic service) with little or no reusability. We need to build a new SHG for each top-level service that contains low-level services (groups of classes) with higher level of granularity.





com.uwstar.crs.vehicle.SUV com.uwstar.crs.vehicle.Vehicle

com.uwstar.crs.vehicle.Truck com.uwstar.crs.vehicle.Vehicle



com.uwstar.crs.vehicle.Car com.uwstar.crs.vehicle.Vehicle

Figure 5.8: The Result SHG of Performing the SHG Transformation on the Original SHG of the Top-Level Service V ehicleBooking in the CRS System. Now, we are ready to identify low-level services underneath top-level services by running Algorithm 5.3 on each top-level service. To save space, we only identify low-level services underneath the top-level service V ehicleBooking.


76










Figure 5.9: The Service Dominance Tree of the SHG in Figure 5.8. Essentially, Algorithm 5.3 computes the MQ metric of V ehicle and runs Algorithm 5.4 repeatedly. In this example, in order to let identified low-level services have appropriate level of granularity, we use Termination Criteria 5.2 to terminate the service aggregation iteration. In the first iteration of Algorithm 5.4, Figure 5.8 shows the result SHG by performing the SHG transformation on the original SHG (shown in Figure 5.7) of the top-level service V ehicleBooking. The result SHG is obtained by aggregating the strongly related atomic services in the original SHG. For instance, two services represented by nodes com.uwstar.crs.vehicle.SU V and com.uwstar.crs.vehicle.V ehicle have an inheritance relationship and thus are agglomerated into one service represented by the node com.uwstar.crs.vehicle.SU V, com.uwstar.crs.vehicle.V ehicle in the SHG depicted in Figure 5.8. The facade class set of the agglomerated service contains com.uwstar.crs.vehicle.SUV and com.uwstar.crs.vehicle.Vehicle because these two classes both provide services to the outside of the new service. Also, there are three nodes in Figure 5.7 which form a cycle :


77

com.uwstar.crs.personAgent, com.uwstar.crs.trainingT rainingCourse, and com.uwstar.crs.trainingT rainingP lan. Hence, low-level services represented by these nodes are agglomerated into a service represented by the node com.uwstar.crs.personAgent in Figure 5.8. The facade class set contains only class com.uwstar.crs.personAgent because the other two classes com.uwstar.crs.trainingT rainingCourse and com.uwstar.crs.trainingT rainingP lan. do not provide services to the outside of the new service. Once the SHG transformation is complete, function GenerateDominanceT ree() generates the service dominance tree from the new SHG. Figure 5.9 shows the service dominance tree of the SHG depicted in Figure 5.8. Function ReduceDominanceT ree() reduces the service dominance tree in Figure 5.9 by applying the Heuristic 5.1 and Heuristic 5.2. Figure 5.11 shows the reduced dominance tree.








Figure 5.10: The Reduced Dominance Tree of the Service Dominance Tree in Figure 5.9.


78

Function ReconstructSHG() recovers the service hierarchy for the services presented in the service dominance tree in Figure 5.10. Figure 5.11 shows the reconstructed from the reduced service dominance tree in Figure 5.10. com.uwstar.crs Booking

com.uwstar.crs.vehicle.SUV com.uwstar.crs.vehicle.Vehicle com.uwstar.crs.person Agent





Figure 5.11: The SHG Reconstructed from the Reduced Service Dominance Tree in Figure 5.10. After the first iteration, by examining the MQ metric of the top-level service V ehicleBooking and the granularity of low-level services underneath the top-level service, we know whether or not the termination criteria are satisfied, and repeating the service aggregation process if the termination criteria are not satisfied. If satisfied, we terminate the process and identify the following low-level services for the top-level service V ehicleBooking : • (Car, {com.uwstar.crs.vehicle.Car, com.uwstar.crs.vehicle.V ehicle}, φ) • (T ruck, {com.uwstar.crs.vehicle.T ruck, com.uwstar.crs.vehicle.V ehicle}, φ) • (SU V, {com.uwstar.crs.vehicle.SU V, com.uwstar.crs.vehicle.V ehicle}, φ) • (V ehicleRepository, {com.uwstar.crs.V ehicleRepository}, φ) • (Agent, {com.uwstar.crs.person.Agent}, φ) • (Customer, {com.uwstar.crs.person.Customer}, φ)


5.4

79

Summary

In this chapter, we have discussed the two processes contained in the service identification stage of the SOC4J framework, namely top-level service identification and low-level service identification. Also the techniques used in this stage have been introduced. The critical business services embedded in an existing system have been identified and modeled. In the subsequent chapter, we will introduce the approach to packaging identified services into self-contained components and the methodology for transforming the existing system into a component-based system.

Chapter 6

Component Generation and System Transformation In the previous chapter, we have presented the methodology for identify services embedded in an existing object-oriented software system. We categorize the critical business services embedded in the system into two categories : top-level services and low-level services. Top-level services and the low-level services underneath each top-level service can be identified by applying the proposed approach. The identified services must be packaged as components so that they can be deployed and thus invoked. Another goal of the proposed SOC4J framework is reconstruct the existing system to a component-based system, based on the components that realize the identified service. This chapter discusses the service realization process and system reconstruction process. In Section 6.1, we discuss how an identified service can be realized as a self-contained component. A transformation technique that automatically reconstructs the existing system into a component-based target system is introduced in Section 6.2. Finally, Section 6.3 gives a summary of this chapter.

80

CHAPTER 6. COMPONENT GENERATION AND SYSTEM TRANSFORMATION

6.1

81


The component-based development (CBD) assembles software from reusable components within frameworks such as CORBA, Sun’s Enterprise JavaBeans (EJBs) and Microsoft COM. The serviceoriented architecture (SOA) encourages individual services to be self-contained. To reuse the identified services and migrate the existing system’s implementation into a component-based architecture, it is necessary to package the identified services into well-documented and selfcontained components. A self-contained component is a component that contains all the code necessary to implement its services and hence can be deployed and invoked independently. At the third stage of the proposed SOC4J framework, we realize each top-level service and the lowlevel services contained in its SHG into self-contained components.

6.1.1 Approach We package each identified service (either top-level service or low-level service) to generate a self-contained component. A component that realizes a top-level service is called a Top-Level Component (TLC), while a component that realizes a low-level service is called a Low-Level Component (LLC). In order to explain the component generation process clearly and automate the process in the implementation, we describe a generated component as a tuple : (name, if , CF , CC , CHG) In the above tuple, name is the name of the component, if is the interface that provides the entry point of the component, CF is the facade class set of the realized service (we also call it the Facade Class Set of the component), CC is the Constituent Class Set which contains all classes/interfaces that are necessary to implement the component, and CHG is the abbreviation of Component Hierarchy Graph that is associated to a top-level component to describe its lowlevel components. The CHG is defined in Definition 6.1. We export and store the generated component represented by the above tuple as an XML document. The XML schema for the


82

component is illustrated in Figure 6.1. Definition 6.1. The Component Hierarchy Graph (CHG) associated with a top-level component is a rooted LDG, where the root, r ∈ V , represents the top-level component, V \ r represents the set of low-level components contained in the top-level component, lV (v) returns the name of v for any v ∈ V , E = {(v, w) ∈ V × V | v contains w}, LE = φ, and hence lE (e) returns an empty label for any e ∈ E. The CHG shows the structural relationships between the low-level components underneath a top-level component. Like the SHG, the CHG gives a high-level representation of the components that is understandable by both developers and business experts. Also, the CHG describes the modularization of its top-level component. There is no CHG associated with a low-level component; that is, CHG = φ for a low-level component. That is because the low-level component has already been presented in the CHG of its top-level service. The CHGs of all top-level components form the component view (CompView) of the system. Before we present the technique for automatically generating components, we introduce the reachability concept in the CIDG and CIRG. We use reachability concept in the component generation process. Definition 6.2. Let G = (V, E) be the CIDG of an existing object-oriented system, where V represents all nodes (i.e., classes or interfaces) in G and E represents all edges (i.e., dependency) in G. Given two classes v ∈ V and w ∈ V , class w is said to be reachable from class v if there ∗

exists a directed path from v to w, denoted by v −→ w. Definition 6.3. Let G = (V, E) be the CIRG of an existing object-oriented system, where V represents all nodes (i.e., classes or interfaces) in G and E represents all edges (i.e., relationships) in G. Given two classes v ∈ V and w ∈ V , class w is said to be inheritance (realization) reachable from class v ∈ CIRG.V if there exists a directed path from v to w and the labels of IN ∗

RE∗

all edges in this path contain inheritance (realization) types, denoted by v −→ w (v −→ w).

CHAPTER 6. COMPONENT GENERATION AND SYSTEM TRANSFORMATION sequence

Component

+name [1..1] : xsd:string +interface [1..1] : xsd:string +facadeClassSet [1..1] +constituentClassSet [1..1] +componentHierarchyGraph [1..1]

+sequence [1..1]

Component::FacadeClassSet

fc_sequence

+fc_sequence [1..1]

+class [0..1] : xsd: string +interface [0..1] : xsd: string cc_sequence

Component::ConstituentClassSet +cc_sequence [1..1]

+class [0..1] : xsd: string +interface [0..1] : xsd: string

Component::ComponentHierarchyGraph +chg_sequence [1..1]

83

chg_sequence +name [1..1] : xsd: string

Figure 6.1: The UML Representation of XML Schema for a Component. We extend the refactoring approach presented in [90] to automatically generate an interface for each component corresponding to an identified service. Let serv be an identified service represented by the tuple serv(name, CF , SHG) and comp be the generated represented by the tuple comp(name, if , CF , CC , CHG), the key steps for generating the component are enumerated as follows : • Step 1 : Name the component by copying its service’s name, comp.name = serv.name. • Step 2 : Compute the facade class set of the component by copying its service’s facade class set, Ccomp.CF = serv.CF . • Step 3 : Compute the constituent class set of the component, S ∗ comp.CC = comp.CF ∪ all c∈comp.CF {v ∈ CIDG.V | (c −→ v)}.


84

• Step 4 : Create a new interface named if . Modify each class in comp.CF to implement if . Modify each interface in comp.CF to extend if . • Step 5 : Add declarations of all public methods defined in each class in VIN to if , where S IN ∗ VIN = all c∈comp.CF {v ∈ CIRG.V | (c −→ v)}, and modify each class in VIN to implement if . • Step 6 : Copy declarations of all public methods declared in each interface in VRE to if , where VRE =

S

all c∈comp.CF {v

RE∗

∈ CIRG.V | (c −→ v)},

and modify each interface in VRE to extend if . • Step 7 : Add declarations of setter and getter methods for all public class fields declared in each class in comp.CF ∪ VIN to if , and implement the corresponding setter an getter methods in classes where these fields are originally declared. • Step 8 : Add declarations of getter methods for all public class fields declared in each interface in comp.CF ∪ VRE to if , and implement the corresponding getter methods in classes that implement the interfaces where these fields are originally declared. • Step 9 : Assign the newly built interface to the component, comp.if = if . • Step 10 : Generate the component hierarchy graph (CHG) for the component,

comp.CHG =

   G serv.SHG 6= φ (i.e., serv is a top-level service);   φ

otherwise.

where G is a copy of serv.SHG, except that names of all nodes in G are changed to corresponding service names, not the facade classes any more.


85

Note that the source modification in the above steps does not change the observable behavior of the original system. Once the tuple (name, if , CF , CC , CHG) for a component has been constructed, we can package all classes and interfaces within CC together with the newly created interface if into a JAR file named name.jar. The packaged component is self-contained and loosely coupled and hence can be deployed and used independently.

6.1.2

An Example

To further describe the component generation process, let us give an example of realizing an identified service. In Chapter 5, we identified services from the hypothetical CRS system . One of these, Customer, is a low-level service underneath the top-level service V ehicleBooking represented by the tuple serv(name, CF , SHG) where serv.name = Customer, serv.CF = {com.uwstar.crs.person.Customer}, and serv.SHG = φ.

Let the tuple comp(name, if , CF , CC , CHG) represent the component that realizes service Customer, the steps for realizing the service are enumerated as follows (the part of UML class diagram of the component is shown in Figure 6.3) : 1. comp.name = serv.name = Customer. 2. comp.CF = serv.CF = {com.uwstar.crs.person.Customer}. 3. Note that

S

all c∈comp.CF {v

∗

∈ CIDG.V | (c −→ v)} represents all classes or interfaces

that are reachable from every class in comp.CF in the CIDG.

CHAPTER 6. COMPONENT GENERATION AND SYSTEM TRANSFORMATION Customer

86

Person

+ id : String - creditRecord : CreditRecord - drivingRecords : DrivingRecord[]

- name : String - address : String - phoneNumber : String

+ Customer() + Customer(String) + updateCreditRecord(int) + addDrivingRecord(String) + getCreditStatus() : int + isSafeDriver() : boolean + evaluateVehicles() : String[]

+ Person() + setName(String) + getName() : String + setAddress(String) + getAddress() : String + setPhoneNumber(String) + getPhoneNumber() : String

Figure 6.2: The UML Class Diagrams of Customer and P erson in the CRS System. In this example, S ∗ all c∈comp.CF {v ∈ CIDG.V | (c −→ v)} = { com.uwstar.crs.person.P erson, com.uwstar.crs.record.CreditRecord, com.uwstar.crs.record.DrivingRecord, com.uwstar.crs.record.Record } Then, we have comp.CC = comp.CF ∪

S

all c∈comp.CF {v

∗

∈ CIRG.V | (c −→ v)} =

{ com.uwstar.crs.person.Customer, com.uwstar.crs.person.P erson, com.uwstar.crs.record.CreditRecord, com.uwstar.crs.record.DrivingRecord, com.uwstar.crs.record.Record } 4. Create a new interface named ICustomer. Since there is only one class in comp.CF (i.e., com.uwstar.crs.person.Customer), we modify this class to implement ICustomer as


87

shown in Figure 6.3. 5. The inheritance reachable class set of class com.uwstar.crs.person.Customer is extracted as follows : VIN = {com.uwstar.crs.person.P erson} Figure 6.2 depicts the UML class diagrams of class com.uwstar.crs.person.Customer and class com.uwstar.crs.person.P erson. We add declarations of all public methods defined in class com.uwstar.crs.person.P erson to ICustomer, and we modify class com.uwstar.crs.person.P erson to implement the interface ICustomer. These modifications are reflected in Figure 6.3. 6. Since the realization reachable class set of class com.uwstar.crs.person.Customer is empty (i.e., VRE = ∅), there is no action needed in this step. 7. As Figure 6.2 shows, there is only one public class field declared in class com.uwstar.crs.person.Customer (i.e., id) and no public class field in class com.uwstar.crs.person.P erson. We add the setter method declaration setID(String) and the getter method declaration getID() : String to interface ICustomer. We also need to implement these two methods in class com.uwstar.crs.person.Customer. Listing 6.1 lists the implementation of these two methods. These modifications are also reflected in Figure 6.3. 8. Again, since VRE = ∅, there is no action needed in this step. 9. comp.if = ICustomer. 10. comp.CHG = φ, because the service Customer is a low-level service. Hence, the generated component is a low-level component. If the service is a top-level service, the CHG of the generated component is the SHG of the top-level service except node names in the SHG are changed to corresponding service names.

CHAPTER 6. COMPONENT GENERATION AND SYSTEM TRANSFORMATION Person

ICustomer + updateCreditRecord(int) + addDrivingRecord(String) + getCreditStatus() : int + isSafeDriver() : boolean + evaluateVehicles() : String[] + setID(String) + getID() : String + setName(String) + getName() : String + setAddress(String) + getAddress() : String + setPhoneNumber(String) + getPhoneNumber() : String

Newly created interface for component Customer

Newly added methods to implement the methods declared in Icustomer interface

- name : String - address : String - phoneNumber : String + Person() + setName(String) + getName() : String + setAddress(String) + getAddress() : String + setPhoneNumber(String) + getPhoneNumber() : String

Customer + id : String - creditRecord : CreditRecord - drivingRecords : DrivingRecord[] + Customer() + Customer(String) + updateCreditRecord(int) + addDrivingRecord(String) + getCreditStatus() : int + isSafeDriver() : boolean + evaluateVehicles() : String[] + setID(String) + getID() : String

Figure 6.3: Part of UML Class Diagram of the Component Customer. Now we are ready to package the following classes (i.e., the constituent class set) : com.uwstar.crs.person.Customer, com.uwstar.crs.person.P erson, com.uwstar.crs.record.CreditRecord, com.uwstar.crs.record.DrivingRecord, and com.uwstar.crs.record.Record together with the newly created interface ICustomer as a JAR file named Customer.jar.

88


89

1 p u b l i c c l a s s Customer e x t e n d s Person implements ICustomer { 2 3 p u b l i c S t r i n g i d ; \\ c u s t o m e r ID 4 ... 5 public void setID ( S t r i n g id ) { 6 t h i s . id = id ; 7 } 8 p u b l i c S t r i n g getID ( ) { 9 return id ; 10 } 11 ... 12 } Listing 6.1: The Implementation of methods setID and getID in class Customer.

6.2

System Transformation

One of the primary goals of the proposed SOC4J framework is to transform the monolithic architecture of an existing object-oriented system to a more flexible service-oriented architecture. In the previous stages of the framework, we have identified services and packaged the identified services into self-contained components. Now, we introduce a reconstruction technique that automatically reconstructs the existing source system into a component-based target system.

6.2.1 Approach The reconstruction process is based on the extracted components. In this thesis, extracted components are categorized into two classes : top-level components and low-level components. A top-level component has an associated component hierarchy graph (CHG) to describe the lowlevel components contained in the top-level component. Each component is self-contained and has been packaged into a JAR file. Based on extracted components, we design a meta-model, depicted in in Figure 6.4, for the component-based target system. The target system is composed


Target System (Component-Based System) 1

contains

1

contains

*

1.. *

Top-Level Component (JAR file) 1

*

90

contains 1

*

Class/Interface (Java file)

contains

contains

*

*

Low-Level Component (JAR file)

1 1

contains

Figure 6.4: The Meta-Model for the Component-Based Target System. of one or more top-level components, as well as a set of classes/interfaces, while each top-level component might consist of some low-level components together with a set of classes and interfaces. Like the top-level component, the low-level component might contain other low-level sub-components, classes and interfaces. In the source system, some classes or interfaces may not be identified as business services or not be contained in identified business services. Therefore, these classes or interfaces are not packaged into components. In order to preserve the behavior of the system, we have to include these classes or interfaces in the component-based target system. We reconstruct the target system by adopting a bottom-up integration technique that collaborates with the extracted components, starting with the components in the lowest position in the component hierarchy. The reconstruction process should not change the observable behavior of the existing system. The surrounding parts of the component should use newly extracted components in order to avoid the situation where two sets of classes, which provide the same functionalities, exist in the same system. Algorithm 6.1 describes the transformation process, taking in the source system and the extracted components represented as input. Extracted components are represented as tuples in the form of (name, if , CF , CC ). The output of the algorithm will be an


91

Algorithm 6.1: System-Transformation Input: An existing object-oriented system and extracted components from the system Output: A component-based target system 1 2

foreach top-level component t do while there exists a low-level component in t.CHG do

3

// star with the component in the lowest position in the // component hierarchy c ← node without descendants in t.CHG;

4

// retrieve components that contain component c P ← parents of c in t.CHG;

5

// refactoring the parents of component c to use c foreach p ∈ P do

6

7

8

9 10 11

Change the code of classes in p.CC that reads (or writes) the public fields of classes in c.CF to the code that invokes the corresponding getter (or setter) methods in interface c.if ; Replace the reference types in classes in p.CC , which refer to any classes in c.CF , with interface c.if ; end // update t.CHG to remove component c Remove node c from t.CHG; end end

instance of the meta-model described in Figure 6.4.

6.2.2 An Example To further describe the system transformation process, we give an example of reconstructing the CRS system into a component-based target system. Consider the following top-level services identified after the service identification stage : • (V ehicle Booking, {com.uwstar.crs.Booking}, SHGV B ). The service hierarchy graph





92


com.uwstar.crs.VehicleRepository com.uwstar.crs.vehicle.Car com.uwstar.crs.vehicle.Truck com.uwstar.crs.vehicle.SUV


(a)

(b)

Figure 6.5: The Service Hierarchy Graphs of the CRS System. Vehicle Booking

Agent

Customer

Vehicle Evaluation

Vehicle Repository

Customer

(a)

(b)

Figure 6.6: The Component Hierarchy Graphs of the CRS System. SHGV B is shown in Figure 6.5 (a). • (V ehicle Evaluation, {com.uwstar.crs.V ehicleEvaluation}, SHGV E ). The service hierarchy graph SHGV E is shown in Figure 6.5 (b). We have two top-level components generated after the component generation stage, and the low-level components underneath each top-level component are described in the related component hierarchy graph. The two top-level components are described as follows : • (V ehicle Booking, IBooking, CF 1 , CHGV B ). The component hierarchy graph CHGV B is shown in Figure 6.6 (a).


93

• (V ehicle Evaluation, IEvaluation, CF 1 , CHGV E ). The component hierarchy graph CHGV E is shown in Figure 6.6 (b). After running Algorithm 6.1, we get the component-based version of the CSR system as Figure 6.7 shown. The component-based system has the same functionality as the original system.

Car Rental System

contains

IBooking

Dealer

:Vehicle Booking

IRepository

IEvaluation

:Vehicle Evaluation

IAgent ICustomer

:Vehicle Repository

:Agent

:Customer

Figure 6.7: The Component-Based Car Rental System.

6.3

Summary

In this chapter, we explained the processes contained in the component generation stage and system transformation stage of the SOC4J framework. We have discussed how an identified service can be realized as a self-contained component and how the existing system can be reconstructed into a component-based system based on the components that realize the identified services.

Chapter 7

Empirical Studies In this chapter, we perform a set of empirical studies on the proposed SOC4J framework to assess the service-oriented componentization techniques introduced in this thesis. The proposed technique has been implemented in a prototype that aims to i)identify critical business services embedded in an existing Java system, ii) realize identified services into self-contained reusable components, and iii) transform the existing system into a component-based system. Therefore, the purpose of the empirical study in this chapter is to test the effectiveness of the proposed SOC4J framework and assess i) the usefulness in terms of feasibility and effectiveness of the architecture recovery and representation approach, ii) the usefulness in terms of efficiency and effectiveness of the business service identification technique, iii) the usefulness in terms of effectiveness of the identified service modeling and packaging techniques, and iv) the time and space complexity of the service-oriented componentization technique as a function of source code size. We outline the implementation of the prototype for the SOC4J framework in Section 7.1. In Section 7.2, we discusses two evaluation criteria for the proposed framework. While we present empirical studies on two Java open source projects in Section 7.3 and 7.4. Finally, we summary this chapter in Section 7.5.

94

CHAPTER 7. EMPIRICAL STUDIES

7.1

95

A Prototype for the SOC4J Framework

As a part of this work, the proposed service-oriented componentization approach has been implemented in a prototype which offers an interactive and integrated environment for i) identifying critical business services embedded in an existing Java system, ii) realizing each identified service as a self-contained component, and iii) transforming the object-oriented design into a service-oriented architecture. We have named the prototype JComp, an Java Componentization Kit. The JComp is an integrated tool workbench targeted at rapidly integrating software tools for prototyping the SOC4J framework. Now, we examine the tool integration requirements for the SOC4J framework and discuss the implementation of the JComp.

7.1.1 Tool Integration Requirements As we discussed in Chapter 3, several software tools are needed for the SOC4J framework to componentize an object-oriented system and re-modularize the existing assets for supporting service functionality. Figure 7.1 depicts the tool interconnection of the SOC4J framework. Five rounded rectangles on the right side of the figure represent the tools needed for the the SOC4J framework, while the flow of data needed for integrating the tools within the framework is shown by the thick arrow on the right side of the diagram. The functionality of each tool is outlined as follows : Source Code Modeling This tool parsers the Java source code and outputs a set of raw data of the facts. Based on the extracted facts, the tool further generates source code models defined in Chapter 4, including, JPackage, JFile, JClass, and JMethod. The raw data set and source code models are exported as XML documents. Architecture Modeling Based on the source code model, this tool identifies all class relationships defined in Chapter 4. It exports identified relationships in graph representations, that


96

Java Source Code

Source Code Modeling Facts Source Code Models

Architecture Modeling CIRG, CIDG

Integrated Tool Workbench for SOC4J Framework

CIRG, CIDG

Service Identification Identified Services Identified Services

Flow of Integration Date

Source Code Models

Component Generation Self-Contained Components Self-Contained Components Source Code

System Transformation Component-Based System

Figure 7.1: The Tool Interconnection for the SOC4J Framework. is, the CIRG and CIDG. Basic reusability attributes for each class in the system also are computed. The CIRG and CIDG are exported as XML documents. Service Identification This tool assists users in identifying the business services embedded in an existing Java system through analysis of the CIRG and CIDG. Firstly, it identifies the top-level services of the system and builds a service hierarchy graph for each identified top-level service. Then, it performs a graph transformation on the service hierarchy graph to identify low-level services for each top-level service. Component Generation This tool realizes identified services into self-contained components. For each identified service, it extracts all classes/interfaces that are necessary for implementing the service, generates an interface for the service, and packages these classes/interfaces together with the interface as a JAR file. System Transformation This tool reconstructs an existing Java system into a component-based


97

system by using the generated component from the source system. The system transformation process preserves the functionality of the source system.

7.1.2 JComp RCP Application The JComp is built on the top of the Eclipse Rich Client Platform (RCP) [68] and hence it is called an Eclipse RCP application. An Eclipse RCP application is a collection of plug-ins and the Runtime on which they run. The platform-independent Eclipse RCP architecture makes richclient applications easy to write because business logic is organized into reusable components called plug-ins. Eclipse RCP provides a core set of services, representing a substantial percentage of the rich client platform development functionality, so that developers do not have to rewrite infrastructure code. These Eclipse RCP services are available to every application component plug-in. These services are the interface between a plug-in and the low-level platform-specific functionality that supports the plug-in, just like a J2EE container is the interface between EJB and the application server. Moreover, because of the Eclipse open source license, we can use the technologies that went into Eclipse to create our own commercial-quality programs. The GUI toolkits used by Eclipse RCP are the same used by the Eclipse IDE and enable applications with optimal performance that have a native look and feel on any platform that they run on. The architecture of the JComp toolkit is depicted in Figure 7.2. The internals of the JComp are the same OSGi runtime and GUI toolkit provided by the Eclipse IDE. The OSGi runtime enables Java code from multiple sources to all run together in a single Java Virtual Machine (JVM). The OSGi framework automatically loads and runs bundles which are encapsulations of various files. This provides the mechanism by which plug-ins can be automatically detected and loaded into the JComp RCP application. The resource manager provides a GUI to show the current configuration; that is, a list of installed plug-ins. It assists the end user in finding and installing new plug-ins. It is also capable of scanning through the list of already-installed plug-ins to look for updates to


98

JComp RCP Application Eclipse RCP Platform

UI (Generic Workbench)

Parser Plug-in

Modeler Plug-in

JFace Extractor Plug-in

SWT Generator Plug-in

Resource Manager Transformer Plug-in

Platform Runtime (OSGi)

Figure 7.2: The Architecture of the JComp Java Componentization Kit. these plug-ins. The Standard Widget Toolkit (SWT) provides a completely platform-independent API that is tightly integrated with the operating system’s native windowing environment. Java widgets actually map to the platform’s native widgets. This gives Java applications a look and feel that makes them virtually indistinguishable from native applications. The JFace toolkit is a platform-independent user interface API that extends and interoperates with the SWT. This library provides a set of components and helper utilities that simplify many of the common tasks in developing SWT user interfaces. The generic workbench provides extension points that the plug-ins extend. The plug-ins provide functionality that is integrated into the RCP platform just as if it were always part of the application. As Figure 7.2 depicted, each tool described in Section 7.1.1 was implemented as a separate JComp plug-in. A snapshot of the JComp Java Componentization Kit is depicted in Figure 7.3.


Figure 7.3: A Snapshot of the JComp Java Componentization Kit.

99


7.2

100

Evaluation Criteria

Since the proposed framework is trying to extract reusable components from an object-oriented system and migrate the object-oriented design to a service-oriented architecture, the evaluation criteria needs to address component reusability and architectural improvement.

7.2.1 Component Reusability The components acquired by applying the proposed framework are structurally reusable because the internal structures are encapsulated and the components are self-contained and thus have no dependency upon the entities outside of them. However, we still need to seek a way to assess the reusability quantitatively.

Reusability Metric Suite Components have two relatively static sources of information : the external documentation and the public interface. The external documentation is an important source of information that can greatly affect component reusability; such documentation is developed for a human audience, which makes it harder to measure. On the other hand, component interfaces are easily parsed by a computer, making them easier to measure. This is an important argument for developing reusability metrics based upon component interfaces. In this thesis, we aim to assess the reusability of the extracted components through the analysis of their interfaces and internal methods as well. We define a reusability metric suite by selecting and adapting the metrics defined in [13, 25, 70, 91]: Parameter Per Method (P P M ) The P P M metric measures the mean size of method declarations of the interface of the component, and it is defined as follows:     IP C PPM =

IM C

  0

if IM C > 0; (7.1) otherwise.


101

where the metric IP C (Interface Parameter Count) is the count of parameters of all public methods in the interface of the component, and the metric IM C (Interface Method Count) is the count of public methods in the interface of the component. It is believed that methods with fewer parameters are easier to understand, and so will be easier to reuse [58]. It follows that component interfaces with lower P P M will tend to have lower complexity and hence better understandability. Reference Parameter Density (RP D) The RP D metric measures the occurrence of reference parameters in an interface, and it is defined as follows:

RP D =

    IRP C

if IP C > 0;

  0

otherwise.

IP C

(7.2)

where the metric IRP C (Interface Reference Parameter Count) is the count of reference type parameters of all public methods in the interface of component. It is believed that the use of references makes it more difficult to understand the program [87]. This is also applicable to interfaces, as arguments which are passed by reference tend to be more difficult to understand than arguments which are passed by value. A higher RP D will indicate that an interface tends to be more difficult to understand. However, it is often necessary for reference arguments to be used so that useful functionality can be implemented. Therefore, a high value is not necessarily evidence of a poor interface, but it does suggest that good documentation is requested [13]. Rate of Component Observability (RCO) The RCO metric measures the percentage of readable properties in all fields implemented within the interface of the component, and it is


102     IRM C

defined as follows: RCO =

IF RC

  0

if IF RC > 0; (7.3) otherwise.

where the metric IRM C (Interface Reader Method Count) is the count of public methods in the interface of the component that read a field, the metric IF RC (Interface Field and Reference Count) is the count of fields and references the interface of the component. RCO indicates the component’s degree of observability for users of the component [91]. To understand the behavior of a component from outside the component, the observability of the component should be high. However, there is a possibility that it is difficult for users to find an important readable property among all of the readable properties when the observability is too high. Rate of Component Customizability (RCC) The RCC metric measures the percentage of writable properties in all fields implemented within the interface of the component, and it is defined     IW M C

as follows: RCC =

IF RC

  0

if IF RC > 0; (7.4) otherwise.

where the metric IW M C (Interface Writer Method Count) is the count of public methods in the interface of the component that write a field. RCC indicates the component’s degree of customizability for users of the component. To adapt the settings of a component from outside the component to the user’s requirements, the customizability of the component should be high. However, too high a customizability violates the encapsulation of the component, and leads to greater opportunities for improper use [91]. Self-Completeness of Component’s Return Values (SCCr ) The SCCr metric measures the per-


103

centage of business methods without any return values in all business methods implemented in the component, and it is defined as follows:     V MC SCCr =

MC

  1

if M C > 0; (7.5) otherwise.

where the metric V M C (Void Method Count) is the count of public methods in the component that have void return type, and the metric M C (Method Count) is the count of public methods in the component. SCCr indicates the component’s degree of self-completeness and external dependency, based on the return values of methods. The smaller the number of business methods without return value, the smaller the possibility of the component having external dependency. High self-completeness of a component (i.e., low external dependency) leads to high portability of the component [91]. Self-Completeness of Component’s Parameters (SCCp ) The SCCp metric measures the percentage of business methods without any parameters in all business methods implemented in the component, and it is defined as follows:     NP MC SCCp =

MC

  1

if M C > 0; (7.6) otherwise.

where the metric N P M C (None Parameter Method Count) is the count of public methods in the component that do not have any parameters. SCCp indicates the component’s degree of self-completeness and external dependency, based on the parameters of methods. The fewer business methods without parameters, the smaller the possibility of having dependency outside the component [91].


104

Reusability Model Reusability is a high-level quality of software components and hence it is the result of the combination and interaction of many low-level properties. The component reusability model typically shows reusability as being composed of properties such as complexity, observability, customizability, and external dependency. From the user’s point of view, we define a component reusability model as illustrated in Figure 7.4. This model is an adaptation of the reusability model introduced by Washizaki et al. [91]. The quality factors are selected only to provide an analysis of the reusability of a component, while factors related to other aspects of component quality that are not considered to be important to reusability are not considered. The choice of the three factors affecting reusability has been made on the basis of an analysis of the activities carried out when reusing a black-box component. We extend Washizaki’s model to quantify the complexity of components by utilizing metric Reference Parameter Density (RPD) proposed in [13]. Thus, the adapted model includes aspects related to the Understandability, Adaptability, and Portability factors given by ISO 9126 [1]. Characteristic

Quality Factor

Criteria

Metric

Complexity

RPD

Observability

RCO

Adptability

Customizability

RCC

Portability

External Dependency

Understandability

Reusability

SCCr SCCp

Figure 7.4: The Component Reusability Model. In order to quantify the reusability of the components generated by our framework, based on


105

the reusability model we formulate reusability measurement as follows: Reusablity = wcomplexity ∗ RP D + wobservability ∗ RCO + (7.7) wcustomizability ∗ RCC + wex−dependency ∗ (

SCCr + SCCp ) 2

By their definition, the values of all metrics in above formula are in [0, 1]. Since the complexity and external dependency have a negative effect on reusability, the weight wcomplexity and wex−dependency could be values in [−1, 0], while the observability and customizability have a positive effect and hence the weight wobservability and wcustomizability could be any values in [0, 1]. Nevertheless, the sum of these four weights is set to 1. Consequently, the reusability value will be in [0, 1] and a higher value represents a higher level of the reusability.

7.2.2 Architectural Improvement The software architecture of a program or computing system is the structure of the system, which comprise software components, the externally visible properties of those components, and the relationships among these components. The more complex a system structure is, the more difficult it is to understand, and therefore to maintain. We wish to measure the degree of conformance, which the target (restructured) architecture presents, to the architectural principles of high intra-module cohesion and low inter-module coupling. In this thesis, we introduce a metric for measuring a large software system to determine if it is ”well-structured”, based on the concept of entropy from information theory. Entropy from an information theoretic point of view has been proposed in [78] for evaluating the structuredness of a software’s design. We adopt the definition of entropy for an object-oriented design introduced in [20] to compute the entropy of our source systems and target systems, re-


106

spectively. The smaller the entropy value, the better structure the system has. We then compare the results to see whether the structures of our target systems are improved. The entropy of a object-oriented system S with n classes is defined as follows [20]:

H(S) = −

n X

p(ci )

p(ci ) log2

(7.8)

i=1

It is assumed that the system is described in a standard class diagram format following UML notation for associations between classes. For a randomly selected unary association, p(ci ) is defined as the probability that the association leads to class ci . The existence of such an association indicates that class ci provides services to the rest of the system, since it responds to messages sent to it. Within this context, bi-directional associations are treated as two separate unary associations. Classes are used as the units for entropy measurement because classes represent the most important fundamental building blocks of an object-oriented system and are an identifiable abstraction that is present both in designs and implementations. To compute the entropy metric of the source system of our framework, let n be the number of classes/interfaces of the source system, we compute p(ci ) as the ratio of the number of incoming edges of class ci over the total number of edges in the CIDG of the source system. To compute the entropy metric of the target system of our framework, we consider n as the total number of components and classes/interfaces contained in the target system, and we then compute p(ci ) using the same way as in the source system except that there may exist an association between a class/interface and a component.

7.3

Case Study : Jetty

In this section, we apply the JComp Java Componentization Kit to Jetty [46] to empirically evaluate the usefulness of the proposed SOC4J framework.


107

7.3.1 Statistics of the Jetty Jetty is an open-source, standards-based, full-featured web server implemented entirely in Java. It is released under the Apache 2.0 licence and is therefore free for commercial use and distribution. Jetty can be used as : i) a stand-alone traditional web server for static and dynamic content, ii) a dynamic content server behind a dedicated HTTP server such as Apache using Apache module mod proxy, and iii) an embedded component within a Java application.

Project Jetty

Version 5.1.10

LOC 44125

Java Source Files 318

Packages 25

Classes 273

Interfaces 47

Table 7.1: Statistics of the Jetty. As shown in Table 7.1, we work on Jetty version 5.1.10, which was released on April 5, 2006. It has about 44K LOC source code and consists of 318 Java source files that defines 273 classes and 47 interfaces distributed in 25 packages.

7.3.2 Discussions on Obtained Results In order to componentize the Jetty system, we first applied the JComp Java Componentization Kit to identify business services embedded in the system. The JComp then generated a self-contained component for each identified service. The Parser plug-in of the JComp imported the source code of the Jetty and built a set of source code models. These source code models were exported and stored as XML documents. The Modeler plug-in imported the source code models and recovered architectural models that are represented by the CIRG and CIDG. Like the source code models, the CIRG and CIDG were exported and stored as XML documents. Firstly, based on the CIRG and CIDG, the Extractor plug-in, which implements the top-level service identification algorithm (i.e., Algorithm 5.2) and the low-level service identification algorithm (i.e., Algorithm 5.3), identified 33 top-level service


108

Figure 7.5: The Accepted Service View of the Extractor plug-in. candidates from the CIDG. We then validated each candidate by examining the facade class set of these candidates, and accepted 16 top-level services. These 16 top-level services represent the functionality of the Jetty from the points of view of end users. Appendix A lists and describes all accepted top-level services of the Jetty web server. Figure 7.5 depicts the accepted Service View of the Extractor plug-in, which displays all accepted top-level services of the Jetty. The unacceptable candidates are dead code, debugging modules, or testing modules. For instance, we found 8 dead classes in org.mortbay.util package and a debugging module whose entry point is


109

the class org.mortbay.servlet.ProxyServlet.

ID T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16

Top-Level Service Win32 Server Dynamic Servlet Invoker Jetty Server MBean Proxy Request Handler XML Configuration MBean Web Application MBean Administration Servlet CGI Servlet Host Socket Listener Web Configuration Authentication Access Handler Servlet Response Wrapper IP Access Handler Multipart Form Data Filter HTML Script Block Applet Block

Classes/interfaces 248 207 126 113 87 86 56 49 46 34 30 27 18 16 12 9

Low-Level Services 11 12 9 7 5 6 5 5 5 3 3 2 0 2 1 1

Table 7.2: Top-Level Services Identified from Jetty. After all the top-level services were validated, the Extractor plug-in then identified low-level services underneath each top-level service. Table 7.2 shows the atomic services and identified low-level services for each top-level service. Actually, atomic services of a top-level service are Java classes or interfaces that implement the top-level service; they are represented by nodes of the original SHG of the services. For example, as Table 7.3 shows, there are 11 low-level services identified from top-level service W in32Server (i.e., top-level service T1). This top-level service runs the Jetty as a Windows HTTP server. When identifying low-level services, we used the Termination Criterion 5.1 described in Chapter 5 to terminate the iteration in Algorithm 5.3 by setting M Q = 0.75. In the case that the level of granularity of services is crucial, the user may use the Termination Criterion 5.2 for Algorithm 5.3. As Figure 7.6 shows, we terminated the lowlevel service identification process at the fifth iteration. The final low-level services identified for


110

- - - - - - - - - - original SHG

- - - - - - - - - - 1st iteration - - - - - - - - - - 2nd iteration

- - - - - - - - - - final iteration

Figure 7.6: Iterations of the Service Aggregation Process of Top-Level Service Win32 Server. top-level service W in32Server are shown in Table 7.3. To realize each identified service (both top-level service and low-level service), the Generator plug-in generated a self-contained component for each service. Figure 7.7 illustrates the component hierarchy graph (CHG) of the top-level component Win32 Server. There are 11 low-level components contained in the top-level component. Furthermore, the Generator plug-in measured the reusability for each generated component, applying the component reusability model by computing Formula (7.7). In this empirical study, we set wcomplexity = − 0.3, wobservability = 0.8, wcustomizability = 0.8, and wex−dependency = − 0.3. Figure 7.8 shows reusability values of the


111

Win32 Server

Jetty Server

Web Application Context

HTTP Connection

HTTP Response

HTTP Request

Service Handlers

Servlet

Servlet Handler

Resource Handler

Security Handler

Figure 7.7: The CHG of Top-Level Component Win32 Server of the Jetty.

Low-Level Component Jetty Server Service Handlers Resource Handler Security Handler Socket Listener HTTP Connection HTTP Request HTTP Response Web Application Context Servlet Servlet Handler

Reusability 0.9 0.6 0.7 0.7 0.8 0.9 0.7 0.5 0.6 0.7 0.8

Table 7.3: Low-Level Services Identified in Top-Level Service Win32 Server.


112

top-level components and the average value of the low-level components underneath each toplevel component. From Figure 7.8, it was observed that all top-level components, except C16, have reusability value above 0.5 and all the average values are between 0.6 to 0.8. Thus, we could conclude that identified services from the Jetty project have a reasonable level of the reusability.

Reusability of Top-Level Components Average Reusability of Low-Level Components in a Top-Level Component 1 0.9 0.8

Reusability

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 C1

C2

C3

C4

C5

C6

C7

C8

C9

C10

C11

C12

C13

C14

C15

C16

Top-Le ve l Compone nts

Figure 7.8: The Reusability of Components Extracted from Jetty. The Transformer plug-in transformed the Jetty into a component-based system based on the generated components. We named the target system Jetty-JComp. As we see in Algorithm 6.1, Jetty-JComp has the same functionality as Jetty. The Jetty-JComp now contains 16 independent JAR files. Each JAR file provides a top-level service and can be used independently. Also, each independent JAR file is a component-based system that consists of a set of JAR files. We have computed the entropy of both Jetty and Jetty-JComp by applying Formula (7.8). When computing the entropy of Jetty-JComp, we used the component hierarchy graphs instead of the CIDG because Jetty-JComp is comprised of components. We found that the entropy of the Jetty-JComp was reduced by 45.5%, compared to the the original Jetty project. Hence, we can conclude that our transformation dramatically improves the structure of the system. In Table 7.4, we summarize the time and space complexity of the proposed service-oriented

CHAPTER 7. EMPIRICAL STUDIES Measurement Item Case Study Size (KLOC) Source Code Modeling Time (min : sec) Source Code Model Space (MB) Architecture Modeling Time (min : sec) Architecture Model Space (MB) Top-Level Service Identification Time (min : sec) Average Low-Level Service Identification Time (sec)

113 Value 44.1 2:18 1.43 4:19 1.57 6:45 66

Table 7.4: Some Time and Space Statistics of the SOC4J Framework on the Case Study : Jetty. componentization framework as a function of source code size of the Jetty project. The experiment was carried on a Windows desktop with Intel Pentium IV CPU 3.4GHz, 2G memory.

7.4

Case Study : Apache Ant

In this section, we apply the JComp Java Componentization Kit on another Java open source project, namely Apache Ant [2], to further evaluate the usefulness of the proposed SOC4J framework.

7.4.1 Statistics of the Apache Ant The Apache Ant is a software tool for automating software build processes. It is similar to make but is written in the Java language and is primarily intended for use with Java. The most immediately noticeable difference between Ant and make is that Ant uses a file in XML format to describe the build process and its dependencies, whereas make has its own Makefile format. By default the XML file is named build.xml. Ant is an Apache project. It is open source software, and is released under the Apache Software License 2.0. As shown in Table 7.5, we work on the Apache Ant version 1.6.5, which is the latest version. It has around 86K LOC source code and consists of 690 Java source files that defines 640 classes and 60 interfaces distributed in 70 packages.

CHAPTER 7. EMPIRICAL STUDIES Project Apache Ant

Version 1.6.5

LOC 86468

114

Java Source Files 690

Packages 70

Classes 640

Interfaces 60

Table 7.5: Statistics of the Apache Ant.

7.4.2 Discussions on Obtained Results To componentize the Apache system, as we have done on the Jetty, we first applied the JComp Java Componentization Kit to identify business services embedded in the system. Then, the JComp generated a self-contained component for each identified service.

ID T1 T3 T4 T6 T8 T11 T14 T17 T21 T25 T30 T31 T48 T49 T53 T63 T69 T74 T85 T92

Top-Level Service Project Building WAR File Creation TAR File Creation JUnit Invocation JAR File Creation Unit Test Execution File Content Loading SSH File Copy Zip File Creation XML File Checking Java Class Execution Dependency Manifest Generation GZip File Expansion File Concatenation Telnet Session Generation CVS Repository Retrieval JavaCC Invocation File Permission Change URL File Retrieval String Replacement

Classes/interfaces 205 152 144 114 113 86 80 67 57 54 45 45 34 34 34 29 26 23 18 16

Low-Level Services 34 17 20 17 17 14 15 19 15 9 11 8 4 6 8 4 5 5 4 4

Table 7.6: Selected Top-Level Services Identified from Apache Ant. The Parser plug-in of the JComp imported the source code of the Apache Ant and built a set

CHAPTER 7. EMPIRICAL STUDIES Low-Level Service File Output Zip File Set Task Generator Identity Mapper Project Loader Zip Scanner File Packing File Mapper File Scanner Resource Selector File Entry Conversion Rules Exception Handle Resource Factory Type Integers File Field Resource Handler

115 Reusability 0.8 0.6 0.9 0.7 0.5 0.9 0.8 0.5 0.6 0.7 0.8 0.9 0.7 0.6 0.5 0.7 0.8

Table 7.7: Low-Level Services Identified in Top-Level Service WAR File Creation. of source code models. These source code models were exported and stored as XML documents. The Modeler plug-in imported the source code models and recovered architectural models that are represented by the CIRG and CIDG. Like source code models, the CIRG and CIDG were exported and stored as XML documents. First, based on the CIRG and CIDG, the Extractor plugin identified 236 top-level service candidates from the CIDG. Then we validated each candidate by examining the facade class set of these candidates. Finally, we accepted 101 top-level services. Appendix B lists and describes all accepted top-level services of the Apache Ant system. These 101 top-level services represent the functionality of the Apache Ant from the point of views of end users. We also found some candidates are dead code, debugging modules, or testing modules, and hence are not accepted as top-level services. After all top-level services were validated, the Extractor plug-in then identified low-level services underneath each top-level service. We randomly selected 20 top-level services from the


116

101 accepted services to further identify low-level services underneath each of these 20 top-level services. Table 7.6 shows the atomic services and identified low-level services for each selected top-level service. For example, as Table 7.7 shows, there are 17 low-level services identified from top-level service W ARF ileCreation (i.e., top-level service T3). The W ARF ileCreation packages Web applications. It packages a set of files into Web archive (WAR) files that should end up in the WEB-INF/lib, WEB-INF/classes or WEB-INF directories of the Web Application Archive. we used the Termination Criterion 5.2 described in Chapter 5 to terminate the iteration in Algorithm 5.3 by examining the level of granularity of low-level services. WAR File Creation

File Output

Zip Scanner

Zip File Set

File Packing

File Entry

Type Integers

Task Generator

Conversion Rules

File Field

File Mapper

Identity Mapper

Project Loader

File Scanner

Exception Handle

Resource Selector

Resource Factory

Resource Handler

Figure 7.9: The CHG of Top-Level Component WAR File Creation of the Apache Ant. Again, to realize each identified service (both top-level service and low-level service), the Generator plug-in generated a self-contained component for each service. Figure 7.9 illustrates the component hierarchy graph (CHG) of top-level component WAR File Creation. There are 17


117

Reusability of Top-Level Components Average Reusability of Low-Level Components in a Top-Level Component 1 0.9 0.8

Reusability

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 C1

C3

C4

C6

C8

C11

C14

C17

C21

C25

C30

C31

C48

C49

C53

C63

C69

C74

C85

C92

Top-Leve l Compone nts

Figure 7.10: The Reusability of Components Extracted from the Apache Ant. low-level components contained in the top-level component. Furthermore, the Generator plugin measured the reusability for each generated component, applying the component reusability model by computing Formula (7.7). Like we did for the Jetty project, we set wcomplexity = −0.3, wobservability = 0.8, wcustomizability = 0.8, and wex−dependency = − 0.3. Figure 7.10 shows reusability values of the top-level components of the Apache Ant and the average value of the low-level components underneath each top-level component. From Figure 7.10, it was observed that all top-level components, except C30, have reusability value above 0.5 and all the average values are between 0.5 to 0.9. Thus, we could conclude that identified services from the Apache Ant project have a reasonable level of the reusability. Based on the generated components, the Transformer plug-in transformed the Apache Ant into a component-based system. We named the target system Apache Ant-JComp. As we see in Algorithm 6.1, Apache Ant-JComp has the same functionality as the Apache Ant. Jetty-JComp now contains 101 independent JAR files. Each JAR file provides a top-level service and can be used independently. Since we have only further decomposed 20 top-level components, each of these 20 corresponding JAR files is a component-based system that consists of a set of JAR files (i.e., low-level components). Also, we have computed the entropy of both Apache Ant and


118

Apache Ant-JComp by applying Formula (7.8). Again, when computing the entropy of Apache Ant-JComp, we used the component hierarchy graphs instead of the CIDG because Apache AntJComp is comprised of components. We found that the entropy of the Apache Ant-JComp was reduced by 16.3%, compared to the original Apache Ant project. The reduction of the entropy is not as big as the Jetty-JComp, because we componentized only 20 top-level services out of 101 top-level services identified from the Apache Ant project.

Measurement Item Case Study Size (KLOC) Source Code Modeling Time (min : sec) Source Code Model Space (MB) Architecture Modeling Time (min : sec) Architecture Model Space (MB) Top-Level Service Identification Time (min : sec) Average Low-Level Service Identification Time (sec)

Value 86.5 5:20 3.34 9:15 3.92 19:43 54

Table 7.8: Some Time and Space Statistics of the SOC4J Framework on the Case Study : Apache Ant. In Table 7.8, we summarize the time and space complexity of the proposed service-oriented componentization framework as a function of source code size of the Apache Ant project. The experiment was carried on a Windows desktop with Intel Pentium IV CPU 3.4GHz, 2G memory.

7.5

Summary

The design and implementation of supporting tools are fundamental requirements to assess the practical use of a re-engineering approach. In this chapter, we developed a toolkit implementing the proposed componentization framework as an Eclipse Rich Client Platform (RCP) application, The important aspects of the proposed framework have been tested through a series of experiments. The empirical study has shown that the proposed framework is effective in identifying services from an existing Java system and reconstructing it to a component-based system.

Chapter 8

Future Directions and Conclusions In this Chapter, we summarize the findings of this thesis and outline future research directions that may arise from this research. In Section 8.1, we present the contributions of this thesis, and in Section 8.2, we discuss some future work that could extend this research. Finally, we make some concluding remarks for this work in Section 8.3.

8.1

Contributions

The principle contributions of this thesis were stated in Chapter 1. Based on the material already presented, we discuss them in more detail : • The design and implementation of comprehensive graph representations of an object-oriented system in different levels of abstraction. These graph representations include the class/interface relationship graph (CIDG), the class/interface dependency graph (CIDG), modularized CIDGs (MCIDGs), service hierarchy graphs (SHGs), and component hierarchy graphs (CHGs). Each graph represents the system in a different level of abstraction. • The exploration of an incremental program comprehension approach, including describ-

119

CHAPTER 8. FUTURE DIRECTIONS AND CONCLUSIONS

120

ing an object-oriented software system using different concurrent views, each of which addresses a specific set of concerns of the system. The SOC4J framework extracts four views to understand an object-oriented software system. The extracted source code models provide the basic view (BView), while the recovered architectural models build the structural view (SView), the identified top-level services together with their service hierarchy graphs give the service view (ServView), and the generated top-level components together with their component hierarchy graphs introduce the component view (CompView) of the system. Each view assists the user in understanding the system from a different perspective. • The design and implementation of an efficient and effective methodology for identifying and realizing critical business services embedded in an existing object-oriented system. The business services embedded in an existing system were categorized into two classes : Top-Level Services (TLS) and Low-Level Services (LLS). A top-level service is a service that is not used by any other services of the system. However, it may contain a hierarchy of low-level services further describing the service. From the requester’s point of view, toplevel services are provided by the system that can be accesses independently. A low-level service is a service that is underneath a top-level service and may be agglomerated with other low-level services to yield a new service with a higher level of granularity. The service identification methodology is a combination of top-down and bottom-up techniques. In the top-down portion of the methodology, we identify the top-level services and the atomic services underneath each top-level service by identifying the entry points of the system. In the bottom-up portion, we aggregate the atomic services to identify services with higher level of granularity by applying a series of graph transformations. The service aggregation is performed incrementally. • The design and implementation of an object-oriented restructuring methodology that transforms the typically monolithic architectures of an existing system to a more flexible service-


121

oriented architecture. For each identified service (both top-level services and low-level services), we generate a self-contained component. A component that realizes a top-level service is called Top-Level Component (TLC), while a component that realizes a low-level service is call Low-Level Component (LLC). Based on extracted components, a meta-model for the component-based target system is designed. we introduce a reconstruction technique that automatically reconstructs the existing source system into a component-based system. • The design and implementation of a prototype system that supports the identification and realization of critical business services embedded in an Java software system and the componentization of the Java System. The prototype is designed as an Eclipse Rich Client Platform (RCP) application and named JComp Java Componentization Kit. A list of JComp plug-ins have been developed to implement the techniques introduced in the framework. A set of empirical studies have been performed on the JComp toolkit.

8.2 Future Work Several new research questions have arisen from this work. We believe that significant improvements can be made in some aspects of the presented approach. The possible future work is presented as follows : • To apply the dynamic analysis on system behavior within the first stage of the SOC4J framework to improve the detection of class relationships. • To investigate algorithmic processes that can be used to automatically categorize the identified services. • To measure the reusability and maintainability of the extracted components more concisely.


122

• To verify that our definitions are consensual with respect to developers’ intent when performing software re-engineering. • To apply our componentization toolkit, JComp, on more real-life programs and to validate their results with the program developers. • To extend our approach on other programming languages. For instance, C++ programs, or even C and COBOL systems. • To develop our approach with more flavors of binary class relationships, such as sharedaggregation and container relationships. • To improve the precision of the service identification by considering design-patterns, alternate implementations of the algorithms, and alternate definitions of the class relationships.

8.3

Conclusions

In this thesis, we presented a service-oriented componentization framework for Java systems. The framework componentizes an object-oriented system to re-modularize the existing assets for supporting service functionality. We introduced an approach for identifying, modeling, and packaging critical business services embedded in an existing system. In addition to producing reusable components realizing the identified services, the framework also provides a component-based integration approach to migrate an object-oriented design to a service-oriented architecture. Our initial evaluation has shown that our framework is effective in identifying services from an objectoriented design and migrating it to a service-oriented architecture. Moreover, the BView, SView ServView, and CompView built by our framework help users gain a program understanding of the system.

Appendix A

Top-Level Services of Jetty ID

Top-Level Service

T1 T2

Win32 Server Dynamic Servlet Invoker

Atomic Services 248 207

T3

Jetty Server MBean

126

T4 T5

Proxy Request Handler XML Configuration MBean

113 87

T6 T7

Web Application MBean Administration Servlet

86 56

T8 T9

CGI Servlet Host Socket Listener

49 46

T10

Web Configuration

34

Description Runs the Jetty as a Windows HTTP server. Invokes anonymous servlets that have not been defined in the web.xml or by other means. Configures a request log, which records all incoming HTTP requests. Makes the HTTP/1.1 proxy requests. Performs all required configurations for running the SESM applications in Jetty containers. Manages web applications’ lifecycle. Jetty Administration Servlet. Allows start and/or stop of server components and control of debug parameters. Runs CGI servlets on Windows. Declares a socket listener for a Jetty http server. Create web container configurations.

Table A.1: Top-Level Services of Jetty (1).

123

APPENDIX A. TOP-LEVEL SERVICES OF JETTY

ID

Top-Level Service

T11

Authentication Access Handler

Atomic Services 30

T12

Servlet Response Wrapper

27

T13

IP Access Handler

18

T14

Multipart Form Data Filter

16

T15 T16

HTML Script Block Applet Block

12 9

124

Description Creates an authentication access handler for HTTP pages. Wraps a Jetty HTTP response as a 2.2 Servlet response. Create a handler to authenticate access from certain IP-addresses. Decodes the multipart/form-data stream sent by a HTML form that uses a file input item. Represents the script block in a HTML form. Represents the applet block in a HTML form.

Table A.2: Top-Level Services of Jetty (2).

Appendix B

Top-Level Services of Apache Ant ID

Top-Level Service

T1 T2 T3 T4 T5 T6

Project Building JAR File Expansion WAR File Creation TAR File Creation Zip File Expansion SQL Statement Execution

Atomic Services 205 164 152 144 117 116

T7 T8 T9 T10

JUnit Invocation JAR File Creation TAR File Expansion File Packing

114 113 95 92

T11

Unit Test Execution

86

T12 T13

WAR File Expansion RPM Invocation

83 81

T14 T15

File Content Loading Metamata MParse Invocation

80 71

T16

CAB File Creation

67

Description Runs Ant on a supplied build file. Unzips a jar file. Creates Web Application Archive files. Creates a tar archive. Unzips a zip file. Executes a series of SQL statements via JDBC to a database. Runs tests from the Junit testing framework. Jars a set of files. Expands a tar file. Packs a file using the GZip or BZip2 algorithm. Executes a unit test in the org.apache.testlet framework. Unzips a war file. Invokes the rpm executable to build a Linux installation file. Loads a file’s contents as Ant properties. Invokes the Metamata MParse compilercompiler on a grammar file. Creates Microsoft CAB Archive files.

Table B.1: Top-Level Services of Apache Ant (1).

125

APPENDIX B. TOP-LEVEL SERVICES OF APACHE ANT ID

Top-Level Service

T17

SSH File Copy

Atomic Services 67

T18

Build File DTD Generation

67

T19

File Encoding Converting

65

T20

Task Adding

59

T21 T22

Zip File Creation Macro Task Definition

57 56

T23

Path Converting

56

T24

FTP Implementation

56

T25

XML File Checking

54

T26 T27

File Expansion Directory Property Setting

52 51

T28

File Availability Property Setting

50

T29

Path Property Setting

50

T30

Java Class Execution

45

T31

Dependency Manifest Generation

45

T32 T33

Key Generation Property Setting

43 43

T34

XML Property File Loading

43

T35 T36

Web Proxy Property Setting XML Report Generation

43 43

126

Description Copies files to or from a remote server using SSH. Generates a DTD for Ant build files that contains information about all tasks currently known to Ant. Converts files from native encodings to ASCII with escaped Unicode. Adds a task definition to the current project, such that this new task can be used in the current project. Creates a zip file. Define a new task as a macro built-up upon other tasks. Converts a path format from one platform to another platform. Implements a basic FTP client that can send, receive, list, and delete files, and create directories. Checks that XML files are valid (or only well-formed). Expands a file packed using GZip or BZip2. Sets a property to the value of the specified file up to, but not including, the last path element. Sets a property if a specified file, directory, class in the classpath, or JVM system resource is available at runtime. Sets a property to the last element of a specified path. Executes a Java class within the running (Ant) VM, or in another VM if the fork attribute is specified. Generates a manifest that declares all the dependencies in manifest. Generates a key in key store. Sets a property (by name and value), or set of properties (from a file or resource) in the project. Loads property values from a well-formed XML file. Sets Java’s web proxy properties. Generates an XML report of the changes recorded in a CVS repository.



Top-Level Service

T37

File Token Identification

Atomic Services 40

T38

Java Class Instrumenting

39

T39

Existing Task Instrumenting

39

T40 T41 T42 T43 T44

File Loading Splash Screen Display File Set Packing CVS Pass Entry Adding File Checksum Generation

39 38 37 37 36

T45

36

T46 T47

Default Exclude Pattern Modification JDepend Invocation Time Stamp Setting

T48 T49

GZip File Expansion File Concatenation

34 34

T50 T51

Directory Synchronization Condition Property Setting

34 34

T52

File Version Checking

34

T53 T54

Telnet Session Generation Attribute Permission Change

34 33

T55

Build File Importing

32

T56

JJTree Invocation

32

T57 T58

Resource Search Temp File Generation

32 31

T59

Remote Command Execution

30

T60

Manifest Creation

29

35 35

127

Description Identifies keys in files, delimited by special tokens, and translates them with values read from resource bundles. Instruments Java classes using the iContract DBC preprocessor. Defines a new task by instrumenting an existing task with default values for attributes or child elements. Loads a file into a property. Displays a splash screen. GZips a set of files. Adds entries to a .cvspass file. Generates a checksum for a file or set of files. Modifies the list of default exclude patterns from within your build file. Invokes the JDepend parser. Sets the DSTAMP, TSTAMP, and TODAY properties in the current project, based on the current date and time. Expands a GZip file. Concatenates multiple files into a single one or to Ant’s logging system. Synchronize two directory trees. Sets a property if a certain condition holds true. Sets a property if a given target file is newer than a set of source files. Automates a remote telnet session. Changes the permissions and/or attributes of a file or all files inside the specified directories. Imports another build file and potentially override targets in it with users’ own targets. Invokes the JJTree preprocessor for the JavaCC compiler-compiler. Finds a class or resource. Generates a name for a new temporary file and sets the specified property to that name. Execute a command on a remote server using SSH. Creates a manifest file.



Top-Level Service

T61

Documentation Generation

Atomic Services 29

T62 T63

XSLT Transformation CVS Repository Retrieval

29 29

T64 T65

SMTP Email Sending User Input

28 28

T66 T67

JProbe Invocation Stylebook Invocation

27 26

T68

File Comparison

26

T69

JavaCC Invocation

26

T70

Regular Expression Replacement

25

T71

JJDoc Invocation

25

T72 T73

Current Property Listing EAA File Creation

25 24

T74

File Permission Change

23

T75

File Deletion

23

T76

Data Type Adding

23

T77

Change Report File Generation

23

T78

File Move

21

T79

Log Recording

21

T80

Project Building Termination

21

128

Description Generates code documentation using the javadoc tool. Processes a set of documents via XSLT. Handles packages/modules retrieved from a CVS repository. Sends SMTP emails. Allows user interaction during the build process by displaying a message and reading a line of input from the console. Invokes the JProbe suite. Executes the Apache Stylebook documentation generator. Compares a set of source files with a set of target files, if any of the source files is newer than any of the target files, all the target files are removed. Invokes the JavaCC compiler-compiler on a grammar file. Replaces the occurrence of a given regular expression with a substitution pattern in a file or set of files. Invokes the JJDoc documentation generator for the JavaCC compiler-compiler. Lists the current properties. Creates Enterprise Application Archive files. Changes the permissions of a file or all files inside the specified directories. Deletes either a single file, all files and subdirectories in a specified directory, or a set of files specified by one or more FileSets. Adds a data-type definition to the current project, such that this new type can be used in the current project. Generates an XML-formatted report file of the changes between two tags or dates recorded in a CVS repository. Moves a file to a new file or directory, or a set(s) of file(s) to a new directory. Runs a listener that records the logging output of the build-process events to a file. Exits the current build by throwing a BuildException, optionally printing additional information.



Top-Level Service

T81 T82

Property File Creation MMetrics Computation

Atomic Services 21 19

T83

Script Execution

19

T84

TAB Updating

18

T85 T86

URL File Retrieval Extension Checking

18 18

T87 T88

Command Execution File Modification Time Change

17 17

T89

Sound File Execution

17

T90

ANTLR Invocation

17

T91 T92

JNI Header Generation String Replacement

17 16

T93

MAudit Computation

15

T94 T95 T96

Directory Creation Text Output File Copying

15 15 13

T97

File Group Ownership Change

12

T98

Project Filter Setting

12

T99

Source Code Extraction

12

T100 File Ownership Change

11

T101 JAR File Information Display

9

129

Description Creates or modifies property files. Computes the metrics of a set of Java source files, using the Metamata Metrics/WebGain Quality Analyzer source-code analyzer. Executes a script in a Apache BSFsupported language. Modifies a file to add or remove tabs, carriage returns, line feeds, and EOF characters. Gets a file from a URL. Checks whether an extension is present in a file set or an extension set. If the extension is present, the specified property is set. Executes a system command. Changes the modification time of a file and possibly creates it at the same time. Plays a sound file at the end of the build, according to whether the build failed or succeeded. Invokes the ANTLR Translator generator on a grammar file. Generates JNI headers from a Java class. Replaces the occurrence of a given string with another string in a selected file. Performs static analysis on a set of Java source-code and byte-code files, using the Metamata Metrics/WebGain Quality Analyzer source-code analyzer. Creates a directory. Echoes text to System.out or to a file. Copies a file or Fileset to a new file or directory. Changes the group ownership of a file or all files inside the specified directories. Sets a token filter for this project, or reads multiple token filters from a specified file and sets these as filters. Allows the user extract the latest edition of the source code from a PVCS repository. Changes the owner of a file or all files inside the specified directories. Displays the ”Optional Package” and ”Package Specification” information contained within the specified jars.


Bibliography [1] Software product evaluation-quality characteristics and guidlines for their use. ISO/IEC Standard ISO-9129, 1991. [2] Apache Ant. A Java-based build tool. http://ant.apache.org/, 2006. [3] Jagdish Bansiya and Carl G Davis. A class cohesion metric for object-oriented designs. Journal of Object-Oriented Programming, 11:47–52, January 1999. [4] Jagdish Bansiya and Carl G Davis. A hierarchical model for object-oriented design quality assessment. IEEE Transactions on Software Engineering, 28:4–17, January 2002. [5] V. Basili, L. Briand, and W. Melo. A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering, 22:751–761, October 1996. [6] L. Belady and C. Evangelisti. System partitioning and its measure. Journal of Systems and Software, 2:23–29, 1981. [7] Martin senting

Bernauer, XML

Gerti Schema

Kappel, in

UML

and -

Gerhard a

Kramler.

comparison

of

Repreapproaches.

http://www.big.tuwien.ac.at/research/publications/2003/1303.pdf, 2003. [8] Martin Bernauer, Gerti Kappel, and Gerhard Kramler. A UML profile for XML Schema. Technical report, Business Informatics Group and Vienna University of Technology, 2003. 130

BIBLIOGRAPHY

131

[9] T. Biggerstaff, B. Mitbander, and D. Webster. The concept assignment problem in program understanding. In Proceedings of the 15th International Conference on Software Engineering (ICSE), pages 482–498, Baltimore, Maryland, USA, May 1993. [10] Bison. The YACC-compatible parser generator. http://dinosaur.compilertools.net/#bison, 2006. [11] G. Booch, M. Christerson, M. Fuchs, and J. Koistinen. UML for XML Schema mapping specification. Rational White Paper, December 1999. [12] B. Borges, K. Holley, and A. Arsanjani.

Delving into service-oriented architecture.

http://www.developer.com/java/ent/article.php/3409221, 2006. [13] Marcus A. S. Boxall and Saeed Araban. Interface metrics for reusability analysis of components. In Proceedings of the Australian Software Engineering Conference (ASWEC), pages 40–51, April 2004. [14] L. C. Briand, J. W. Daly, and J. K. Wust. A unified framework for coupling measurement in object-oriented systems. IEEE Transactions on Software Engineering, 25:91–121, January-February 1999. [15] L. C. Briand, S. Morasca, and V. Basili. Measuring and assessing maintainability at the end of high-level design. In Proceedings of the IEEE Conference on Software Maintenance (ICSM), pages 74–81, Montreal, Canada, September 1993. [16] A. Brown, S. Johnston, and K. Kelly. Using service-oriented architecture and componentbased development to build web service applications. Santa Clara, CA: Rational Software Corporation, 2002. [17] E. Burd and M. Munro. Evaluating the use of dominance trees for C and COBOL. In Pro-

BIBLIOGRAPHY

132

ceedings of the International Conference on Software Maintenance (ICSM), pages 401– 410, September 1999. [18] Gianluigi Caldiera and Victor R. Basili. Identifying and qualifying reusable software components. IEEE Computer, 24:61–70, Febuary 1991. [19] David Carlson. Modeling XML Applications with UML: Practical e-Business Applications. Addison Wesley Professional, 2001. [20] Alexander Chatzigeorgiou and George Stephanides. Entropy as a measure of objectoriented design quality. In Proceedings of the Balkan Conference in Informatics (BCI), pages 565–573, November 2003. [21] K. Chen and V. Rajlich. Case study of feature location using dependence graph. In Proceedings of the 8th International Workshop on Program Comprehension (IWPC), pages 241–249, Limerick, Ireland, June 2000. [22] S. R. Chidamber and C. F. Kemerer. Towards a metrics suite for object oriented design. In Proceedings of the Conference on Object-Oriented Programming: Systems, Languages and Applications (OOPSLA), SIGPLAN Notices 26(11), November 1991. [23] S. R. Chidamber and C. F. Kemerer. A metrics suite for object oriented design. IEEE Transactions on Software Engineering, 20:476–493, June 1994. [24] Y. Chiricota, F. Jourdan, and G. Melancon. Software components capture using graph clustering. In Proceedings of the International Workshop on Program Comprehension (IWPC), pages 217–226, May 2003. [25] E. Cho, M. Kim, and S. Kim. Component metrics to measure component quality. In Proceedings of the 8th Asia-Pacific Software Engineering Conference (APSEC), pages 419– 426, Macau SAR, China, December 2001.

BIBLIOGRAPHY

133

[26] D. Cimitile and G. Visaggio. Software salvaging and call dominance tree. Journal of Systems and Software, 28:117–127, Febuary 1992. [27] R. Conrad, D. Scheffner, and J. C. Freytag. XML conceptual modeling using UML. In Proceedings of the 19th International Conference on Conceptual Modeling, pages 558– 571, Salt Lake City, Utah, USA, October 2000. [28] J. Daly, A. Brooks, J. Miller, J. Topber, and M. Wood. The effect of inheritance depth on the maintainability of object-oriented software. Empirical Software Engineering: An International Journal, 1:751–761, February 1996. [29] J. Eder, G. Kappel, and M. Schrefl. Coupling and cohesion in object-oriented systems. Technical report, University of Klagenfurt, 1994. [30] Thomas Eisenbarth, Rainer Koschke, and Daniel Simon. Locating features in source code. IEEE Transactions on Software Engineering, 29(3):210–224, March 2003. [31] L. H. Etzkorn and C. G. Davis. Automatically identifying reusable oo legacy code. Computer, 30:66–71, October 1997. [32] R. Fanta and V. Rajlich. Reengineering object-oriented code. In Proceedings of International Conference on Software Maintenance (ICSE), pages 238–246, Bethesda, Maryland, March 1998. [33] Flex. A fast scanner generator. http://dinosaur.compilertools.net/#flex, 2006. [34] P. Fremantle, S. Weerawarana, and R. Khalaf. Enterprise services. Communications of the ACM, 45(10):77–80, 2002. [35] G. C. Gannod, S. V. Mudiam, and T. E. Lindquist. An architectural-based approach for synthesizing and integrating adapters for legacy software. In Proceedings of the Seventh

BIBLIOGRAPHY

134

Working Conference on Reverse Engineering (WCRE), pages 128–139, Brisbane, Australia, November 2000. [36] Jean-Franqois Girard and Rainer Koschke. Finding components in a hierarchy of modules: a step towards architectural understanding. In Proceedings of the 13th International Conference on Software Maintenance (ICSM), pages 58–65, Bari, Italy, October 1997. [37] U. Gleich and T. Kohler. Tool-support for reengineering of object-oriented systems. In Proceedings of ESEC-FSE/Workshop on Object-Oriented Reengineering, pages 43–51, Zurich, Switzerland, September 1997. [38] W. G. Griswold, J. J. Yuan, and Y. Kato. Exploiting the map metaphor in a tool for software evolution. In Proceedings of the 23th International Conference on Software Engineering (ICSE), pages 265–274, Toronto, Canada, May 2001. [39] CGI Group. Component mining: An approach for identifying reusable components from legacy systems. http://www.cgi.com/cgi/pdf/cgi whpr 07 mining e.pdf, 2004. [40] W3C Working Group. Web service architecture. http://www.w3.org/TR/2004/NOTE-wsarch-20040211/, 2006. [41] Yann-Gaél Guéhéneuc and Hervé Albin-Amiot. Recovering binary class relationships: Putting icing on the UML cake. In Proceedings of the 19th Annual ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 301–314, Vancouver, Canada, October 2004. [42] George Yanbing Guo, Joanne M. Atlee, and Rick Kazman. A software architecture reconstruction method. In Proceedings of the 1st Working IFIP Conference on Software Architecture, pages 225–243, San Antonio, TX, USA, February 1999.

BIBLIOGRAPHY

135

[43] D. Hutchens and V. Basili. System structure analysis: Clustering with data bindings. IEEE Transactions on Software Engineering, 11(8):749–757, August 1985. [44] JavaCC. Java compiler compiler. https://javacc.dev.java.net/, 2006. [45] Jess. A rule engine for the Java platform. http://www.jessrules.com/jess/index.shtml, 2005. [46] Jetty. A Java HTTP server and servlet container. http://jetty.mortbay.org/jetty/index.html, 2006. [47] Jini. Jini network technology. http://www.sun.com/software/jini/, 2006. [48] Rick Kazman and S. Jeromy Carrière. View extraction and view fusion in architectural understanding. In Proceedings of the 5th International Conference on Software Reuse, pages 290–299, Victoria, BC, Canada, May 1998. [49] Wing Lam and Venky Shankararaman. An enterprise integration methodology. IT Professional, 6(2):40–49, 2004. [50] Lex. A lexical analyzer generator. http://dinosaur.compilertools.net/#lex, 2006. [51] Shimin Li and Ladan Tahvildari. Jcomp: A reuse-driven componentization framework for java applications. In Proceedings of the International Conference on Program Comprehension (ICPC), pages 264–267, Athens, Greece, June 2006. [52] Shimin Li and Ladan Tahvildari. A service-oriented componentization framework for java software systems. In Proceedings of the 13th IEEE Working Conference on Reverse Engineering (WCRE), Benevento, Italy, October 2006. [53] Jing Luo, Renkuan Jiang, Lu Zhang, Hong Mei, and Jiasu Sun. An experimental study of two graph analysis based component capture methods for object-oriented systems. In Pro-

BIBLIOGRAPHY

136

ceedings of the International Conference on Software Maintenance (ICSM), pages 217– 226, May 2003. [54] S. Mancoridis, B. Mitchell, Y. Chen, and E. R. Gansner. Bunch: A clustering tool for the recovery and maintenance of software system structures. In Proceedings of the International Conference on Software Maintenance (ICSM), pages 50–62, Oxford, UK, August 1999. [55] S. Mancoridis, B. Mitchell, C. Rorres, and Y. Chen. Using automatic clustering to produce high-level system organizations of source code. In Proceedings of International Workshop on Program Comprehension (IWPC), pages 45–53, Ischia, Italy, June 1998. [56] M. Marin, A. Deursen, and L. Moonen. Identifying aspects using fan-in analysis. In Proceedings of the 11th Working Conference on Reverse Engineering (WCRE), pages 132– 141, Delft University of Technology, Netherlands, November 2004. [57] J. Martin and H. A. Muller. C to Java migration experiences. In Proceedings of the 6th European Conference on Software Maintenance and Reengineering, pages 143–153, Budapest, Hungary, March 2003. [58] Steve McConnell. Code Complete. Microsoft Press, Redmond, Washington, USA, 1993. [59] Alok Mehta and George T. Heineman. Evolving legacy systems features using regression test cases and components. In the 4th International Workshop on Principles of Software (IWPSE), pages 190–193, Vienna, Austria, September 2001. [60] Alok Mehta and George T. Heineman. Evolving legacy system features into fine-grained components. In the 24th International Conference on Software Engineering (ICSE), pages 417–427, Buenos Aires, Argentina, May 2002.

BIBLIOGRAPHY

137

[61] Robert Morgan. Building an Optimizing Compiler. Butterworth-Heinemann, Boston, Massachusetts, 1998. [62] S. S. Muchnick. Advanced Compiler Design Implementation. Morgan Kaufmann Publishers, San Francisco, California, 1997. [63] H. Muller, M. Orgun, S. Tilley, and J. Uhl. A reverse engineering approach to subsystem structure identification. Journal of Software Maintenance: Research and Practice, 5:181– 204, 1993. [64] H. Muller and J. Uhl. Composing subsystem structures using (k,2)-partite graphs. In Proceedings of International Conference on Software Maintenance (ICSM), pages 12–19, San Diego, November 1990. [65] OMG. UML 2.0 Superstructure Specification. Object Management Group, Framingham, Massachusetts, USA, October 2004. [66] Margaretha W. Price and Steven A. Demurjian. Analyzing and measuring reusability in object-oriented design. In Proceedings of the 12th ACM SIGPLAN Conference on ObjectOriented Programming, Systems, Languages, and Applications, pages 22–33, Atlanta, Georgia, United States, October 1997. [67] W.

Provost.

UML

for

W3C

XML

Schema

design.

http://www.xml.com/pub/a/2002/08/07/wx-suml.html, 2006. [68] RCP. Rich Client Platform. www.eclipse.org/rcp, 2005. [69] M. P. Robillard and G. C. Murphy. Concern graphs: Finding and describing concerns using structural program depnedencies. In Proceedings of the 24th International Conference on Software Engineering (ICSE), pages 406–416, Buenos Aires, Argentina, May 2002.

BIBLIOGRAPHY

138

[70] O. P. Rotaru and M. Dobre. Reusability metrics for software components. In Proceedings of the 3rd International Conference on Computer Systems and Applications (AICCSA), pages 24–32, Cairo, Egypt, January 2005. [71] N. Routledge, L. Bird, and A. Goodchild. UML and XML Schema. In Proceedings of the 13th Australian Database Conference (ADC), pages 274–281, Melbourne, Australia, February 2002. [72] SDMetrics. SDMetrics User Manual. http://www.sdmetrics.com/manual/LOMetrics.html, 2006. [73] Subhash Sharma. Applied Multivariate Techniques. John Wiley, 1996. [74] S. C. Shaw, M. Goldstein, M. Munro, and E. Burd. Moral dominance relations for program comprehension. IEEE Transactions on Software Engineering, 29:851–863, Septmeber 2003. [75] Suk Kyung Shin and Soo Dong Kim. A method to transform object-oriented design into component-based design using object-z. In Proceedings of the International Conference on Software Engineering Research, Management and Applications (SERA), pages 274–281, August 2005. [76] A. Shokoufandeh, S. Mancoridis, and M. Maycock. Applying spectral methods to software clustering. In Proceedings of the Working Conference on Reverse Engineering (WCRE), pages 3–10, November 2002. [77] H.M. Sneed. Encapsulating legacy software for use in client/server systems. In Proceedings of the Working Conference on Reverse Engineering (WCRE), pages 104–119, November 1996.

BIBLIOGRAPHY

139

[78] G. Snider. Measuring the entropy of large software systems. HP Technical Report HPL2001-221, 2001. [79] T. A. Standish. An essay on software reuse. IEEE Transactions on Software Engineering, 10:494–497, September 1984. [80] Ladan Tahvildari. Quality-Drive Object-Oriented Re-engineering Framework. PhD Thesis, Department of Electrical and Computer Engineering, University of Waterloo, Ontario, Canada, August 2003. [81] Ladan Tahvildari. Testing challenges in adoption of component-based software. In Proceedings of Proceedings of ICSE Workshop on Adoption-Centric Software Engineering (ACSE), pages 21–25, Edinburgh, Scotland, May 2004. [82] Ladan Tahvildari and Kostas Kontogiannis. Improving design quality using meta-pattern transformations: A metric-based approach. Journal of Software Maintenance and Evolution: Research and Practice (JSME), 16(4), 2003. [83] Ladan Tahvildari and Kostas Kontogiannis. Develop a multi-objective decision approach for selecting source-code improving transformations. In Proceedings of the 20th International Conference on Software Maintenance (ICSM), pages 427–431, Chicago, Illinois, USA, September 2004. [84] Ladan Tahvildari and Kostas Kontogiannis. Quality-driven object-oriented code restructuring. In Proceedings of Proceedings of ICSE Workshop on Software Quality (ICSE), pages 47–52, Edinburgh, Scotland, May 2004. [85] Ladan Tahvildari and Kostas Kontogiannis. Requirements driven software evolution. In Proceedings of the 12th IEEE International Workshop on program Comprehesion (IWPC), pages 258–269, Bari, Italy, June 2004.

BIBLIOGRAPHY

140

[86] Ladan Tahvildari, Kostas Kontogiannis, and John Mylopoulos. Quality-driven software re-engineering. Journal of Systems and Software (JSS), Special Issue on: Software Architecture - Engineering Quality Attributes, 66(3):225–239, June 2003. [87] P. Tonella, G. Antoniol, R. Fiutem, and E. Merlo. Points-to analysis for program understanding. In Proceedings of the 5th International Workshop on Program Comprehension (IWPC), pages 90–99, May 1997. [88] W3C. XML Schema Part I: Structures second edition. http://www.w3.org/TR/xmlschema1/, 2006. [89] Ju An Wang. Towards component-based software engineering. Computing Sciences in Colleges, 16:177–189, October 2000. [90] H. Washizaki and Y. Fukazawa. A technique for automatic component extraction from object-oriented programs by refactoring. Science of Computer Programming, 56:99–116, April 2005. [91] H. Washizaki, H. Yamamoto, and Y. Fukazawa. A metrics suite for measuring reusability of software components. In Proceedings of the International Software Metrics Symposium (METRICS), pages 211–223, Spetember 2003. [92] N. Wilde, M. Buckellew, H. Page, and V. Rajlich. A case study of feature location in unstructured legacy fortran code. In Proceedings of the 5th European Conference on Software Maintenance and Reengineering (CSMR), pages 68–75, Lisbon, Portugal, March 2001. [93] N. Wilde and M.C. Scully. Software reconnaissance: Mapping program features to code. Journal of Software Maintenance: Research and Practice, 7:49–62, January 1995. [94] W. E. Wong, S. S. Gokhale, and J. R. Hogan. Quantifying the closeness between program components and features. Journal of Systems and Software, 54(2):87–98, October 2000.

BIBLIOGRAPHY

141

[95] W. E. Wong, S. S. Gokhale, J. R. Hogan, and K. S. Trivedi. Locating program features using execution slices. In Proceedings of IEEE Symposium on Application-Specific Systems and Software Engineering and Technology, pages 194–203, Richardson, Texas, USA, March 1999. [96] W. Eric Wong and J. Jenny Li. Redesigning legacy systems into the object-oriented paradigm. In Proceedings of International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC), Hakodate, Hokkaido, Japan, May 2003. [97] X. Xu, C. H. Lung, M. Zaman, and A. Srinivasan. Program restructure through clustering technique. In Proceedings of International Workshop on Source Code Analysis and Manipulation (SCAM), pages 75–84, September 2004. [98] Yacc. Yet another compiler-compiler. http://dinosaur.compilertools.net/#yacc, 2006. [99] Zhuopeng Zhang, Ruimin Liu, and Hongji Yang. Service identification and packaging in service oriented reengineering. In Proceedings of the 7th International Conference on Software Engineering and Knowledge Engineering (SEKE), pages 241–249, Taipei, Taiwan, China, July 2005. [100] Wei Zhao, Lu Zhang, Yin Liu, Jiasu Sun, and Fuqing Yang. Sniafl: Towards a static non-interactive approach to feature location. In Proceedings of the 26th International Conference on Software Engineering (ICSE), pages 293–303, Scotland, UK, May 2004. [101] Ying Zou and Kostas Kontogiannis. Towards a web-centric legacy system migration. In Proceedings of ICSE Workshop on Net-Centric Computing (NCC), May 2001.

A Service-Oriented Componentization Framework for Java ... - STAR

A Service-Oriented Componentization Framework for Java ... - STAR

Suggest Documents

A Secure Execution Framework for Java - CiteSeerX

A Lightweight Java Framework for Wireless

A Java Framework for Runtime Modules - CiteSeerX

A Lightweight Java Taskspaces Framework for ... - Mathematics

A Formal Framework for Java Separate Compilation

A Java Framework for the Development and

WEB SERVICE COMPONENTIZATION

WEB SERVICE COMPONENTIZATION - CiteSeerX

a java framework for analysing and processing wound images for ...

A Dynamic Optimization Framework for a Java ... - ACM Digital Library

Business Componentization: A Guidance to ... - Springer Link

Java Concurrency Framework - Colorado

JAVA NIO FRAMEWORK - SourceForge

Java Concurrency Framework

JavaFrame: Framework for Java Enabled Modelling - CiteSeerX

JAVA-BASED FRAMEWORK FOR REMOTE ACCESS TO ...

Abeans: Application Development Framework for Java

JavaFrame: Framework for Java Enabled Modelling

Simplified Concurrency: A Java Simulation Framework - CiteSeerX

MAFRA: A Java Memetic Algorithms Framework ...

a framework-based integration of Java and Prolog for ... - CiteSeerX

Extending a Java Based Framework for Scientific Software ... - CiteSeerX

jMetal: a Java Framework for Developing Multi-Objective Optimization ...

Salespoint: A Java framework for teaching object ... - Semantic Scholar