A Grid Oriented Approach to Reusing Legacy Code in ICENI Framework

Appeared in Proceedings of IEEE International Conference on Information Reuse and Integration (IRI'05), IEEE Systems, Man, and Cybernetics Society, 2005, pp. 464-469 (Pre-Print Version) (Preprinted Version)

A Grid Oriented Approach to Reusing Legacy Code in ICENI Framework Jianzhi Li, Zhuopeng Zhang and Hongji Yang Software Technology Research Laboratory, De Montfort University, Leicester, LE1 9BH, England {jianzhli, zpzhang, [email protected]} Abstract Legacy systems are valuable assets for organisations. They continuously evolve with new emerged technologies in rapidly changing business environment. ICENI provided an excellent Grid middleware framework for developing Grid-based systems. It creates an opportunity for legacy systems to evolve in Grid environment. In this paper, we propose a component-based reengineering approach which applies software clustering techniques and program slicing techniques to recover components from legacy systems. It supports component encapsulation with JNI and component integration with CXML. The resulting components with core legacy code function in Grid environment. Keywords: Legacy System, Software Reengineering, Software Clustering, Program Slicing, Component Mining, Grid Computing.

1. Introduction The concept of Grid computing emerged within a project designed to link wide-area supercomputing resources for computational and collaborative science. Over the next decade it is now thought that Grids will integrate low-cost commodity clusters, supercomputers, large-scale storage facilities, remote data sensors and mobile computing devices; the result will be a number of geographically distributed, unified computing resources which provide computing power 'on demand' and which access 'unlimited' data [2]. Although this Grid computing prospects seem far from realistic, current research are constructing computational environment, which is a Grid of distributed resources consisting of large supercomputers, databases and networked instruments connected by high speed networks. For example, ICENI (Imperial College e-Science Network Infrastructure) is a Grid middleware system, consisting of a service oriented architecture and an augmented component programming model being developed at the London e-Science Centre [3]. This platform independent framework explores effective application execution upon distributed federated

resources through open and extensible XML derived protocols and a Java implementation. The research of Grid computing developed rapidly; however, legacy system reuse in a Grid environment becomes an intractable issue. Many large industrial and scientific applications, which are available today, were written well before Grid computing or Service-Oriented Architectures (SOA) appeared. The reuse and integration of these legacy systems in service-oriented Grid architectures with the lower cost and higher performance need to be investigated. In this paper, we proposed a component-based approach to migrate parts of legacy systems in a Grid environment, which is based on ICENI infrastructure. The rest of this paper is organised as follows. Section 2 presents related work in reengineering legacy systems with Grid technology. Section 3 elaborates our proposed approach and core techniques in details. Finally, Section 4 concludes the paper with a summary of our experience to date.

2. Related Work Many approaches to the reuse and integration of legacy systems in a Grid environment have been proposed in previous research. Most of these approaches endeavoured to expose the whole legacy system as Web services, which is summarised in [9]. The Numerically Intensive Java project (NINJA) [13] shows there are no serious technical impediments to the adoption of Java as a major language for numerically intensive computing. In NINJA, language and compiler techniques are used to address Java performance problems. This type of approach is essential, if Java is to be adopted throughout the high performance computing community. Huang describes a semi-automatic conversion from legacy C code into Java code in [8]. After wrapping the native C application with the JACAW (Java-C Automatic Wrapper) tool, MEDLI (MEdiation of Data and Legacy Code Interface) is used for data mapping in order to make the code available as parts of a Grid workflow. Based on different principles, [11] describes the deployment of legacy code as Grid services in GEMLCA (Grid Execution Management for Legacy Code

Architecture) without modifying the original legacy code. This approach only supports GT-3 services, which is based on Open Grid Services Architecture (OGSA). In a word, current approaches about reengineering legacy systems with Grid technology are not mature and accomplished, novel approaches are required in this research area.

as a set of software resource services, which may be composed together to form a Grid application. The following subsections elaborate this proposed approach which is shown in Figure 1. Legacy Code Language – Specific Parsing

3. Proposed Approach

Annotated AST

We propose a component-based Grid-oriented approach, which is especially applicable to reengineering tasks with the following characteristics: • Legacy systems have some reusable and reliable functionalities embedded with valuable business logic; • The functionalities within legacy components are meaningful and more powerful to be exposed in Grid environment from the requirements point of view; • Reusable components extracted from a legacy system are fairly maintainable compared to maintain the whole legacy system; • Some components of the target system run on different platforms or vendor products. Legacy system

Grid-oriented applications

Evaluation and reengineering decision-making

Component deployment in ICENI framework

System decomposition by hierarchical clustering

CMXL- based component integration

Understanding legacy code by program slicing

Encapsulating legacy code via JNI

Reusable legacy code segments

Figure 1. Proposed approach overview The aim of this approach is to reuse recovered legacy components in ICENI framework. Firstly, an evaluation of legacy systems is performed to confirm the applicability of this approach. Secondly, the legacy system is decomposed into component candidates via hierarchical clustering techniques, which are the essential techniques used in component mining in our approach. Then static program slicing techniques are applied to further understand these component candidates. Reverse engineering techniques play an important role in this analysis process. Based on the comprehension, these component candidates are extracted as concerned legacy code segments. In order to be reused as components, the extracted legacy code is refined and encapsulated mainly through JNI (Java Native Interfaces). After these legacy components are created, they will be integrated in ICENI framework

Clustering Operations

Modules or Component Candidates Static Program Slicing Understanding

Reusable Legacy Code

Figure 2. Component mining process The evaluation of the current legacy system is essential for further reuse, because the legacy system normally has been used and maintained for a long time. This evaluation reveals the current status of a legacy system and specifies what phase it is in its lifecycle. A decision tree is created in the assessment in order to consider many related aspects altogether. The Options Analysis for Reengineering (OAR) technique is applied to guide the decision-making process. Reengineering decisions are made according to this assessment. If a legacy system is selected for the application of our component-based reengineering approach, a component mining process will be performed. Reverse engineering techniques, such as hierarchical agglomerative clustering and static program slicing, are applied in this component mining process as shown in Figure 2.

3.1 Legacy System Decomposition Software systems evolve over time, as a result of requirement changes. The resulting systems tend to have a rich and complex structure, which is highly coupled. Due to long term maintenance and evolution, the documentations may not be able to reflect the actual structure of legacy systems. In many cases, effective partitioning or re-partitioning is needed to decompose legacy systems with a high cohesion and low coupling principle [18]. Software clustering technique is applied to capture reusable legacy code segments, which is independent, selfcontained, coarse-grained and loose-coupling. Software clustering technique is to group large mounts of entities in a dataset into clusters according to their relationship and similarity. Before applying clustering analysis, the entities to be clustered, the similarity between two entities and a clustering algorithm must be defined.

T ra n s a c tio n R e la te d O p e ra tio n s

E 3

E 4

E 5

E 6

E 7

E 9

E10

E 11

Print_it ()

List_all_gui ()

See_it ()

E 20

List_all ()

Show_trans ()

E22

Update_main ()

Trans_gui ()

E 21

Sav_account ()

Trans_menu ()

E 18

Getaccount_no ()

Main_menu ()

E19

New_account_gui ()

Process ()

E 15

New_account ()

Intro ()

E17

Mod_choice ()

E 16

O u tp u ts

Mod_menu ()

E 14

Withdraw ()

E 13

Dep_gui ()

E 12

Deposite ()

E 2

Mod_trans ()

E 8

Report ()

E 1

N e w A c c o u n t C re a tio n

Figure 3. An example of a dendrogram An entity could represent a subsystem, a directory, a file, a class, a function, a data structure, a variable, and so on. A resemblance coefficient for a given pair of entities indicates the degree of similarity or dissimilarity between these two entities, depending on the way in which the data is represented. A resemblance coefficient could be qualitative or quantitative. Whether qualitative or quantitative data should be chosen depends on the applications and attributes. After applying clustering algorithm, the results should be evaluated and explained correctly [17]. The clustering techniques adopted in our approach are based on numerical taxonomy. Numerical taxonomy uses numerical methods to classify entities. This method has conceptual and mathematical simplicity [14]. The overall computational complexity of the algorithm is O (n3) for an n x n matrix. In fact, the computation runs quickly due to its simplicity. The proposed approach presented in this paper has been applied to a legacy bank system. A clustering analysis process on a subsystem of the bank system is described in this section. The subsystem is written in C language and consists 22 functions or about 1,263 lines of code. The clustering analysis process can be divided into the following steps. Step 1. Obtaining the data matrix. In this case study, the entities are defined as functions, because the legacy bank system is procedure-based code. In order to apply the clustering technique, we tailor the technique based on function-function interconnections. In this case, the data set represents the interdependencies or interconnections which are indicated by function calling relationships. The entity interconnections are defined differently regarding different software systems. For objected-oriented programs, the coupling information could be classattribute, class-method, method-method, the inheritance among classes, or a shared feature. In this input data matrix, the 1 entries show the corresponding functions are interconnected. The matrix is symmetrical and 1 entries used in the main

diagonal. The numeric values (1 or 0 entries), which show the number of interconnections among functions, are obtained from a static analysis of the source code. The number of function calls based on syntax analysis, however, does not reflect the actual number of invocations of those functions. Dynamic information may be more insightful, but it is difficult to obtain and is highly dependent on run time behaviour. Reengineering Assistant (RA), which is a semi-automatic reengineering tool [19], is used to capture these coupling information and generate the input matrix. Step 2. Computing the resemble coefficients for the matrix. To ascertain the similarity between two entities, we calculate the proportion of relevant matches between the two entities. There are different methods of counting relevant matches and there exist many algorithms to calculate the similarity or resemblance coefficient [14]. We adopt the Sorenson coefficient to calculate the similarity between functions. Let a denote 1-1 match, which means the same attribute is coded 1 for both entities. Similarly, let b, c, d denote 1-0, 0-1 and 0-0 match between two entities. Let Sxy be the resemblance coefficient for entities x and y. then the Sorenson coefficient: Sxy = 2a / (2a + b + c). Compared to other coefficient definitions, Sorenson coefficient emphasises 1-1 match by giving twice weight. It is because the attributes are present in both entities at the same time. Sorenson coefficient does not count 0-0 matches is another reason we use it for software clustering. From the software perspective, 0-0 matches represent software entities do not share commonalities and have no relations, so they should be ignored in the similarity function. Step 3. Executing the clustering algorithm. In essence, the clustering algorithm is a sequence of operations that incrementally groups similar entities into clusters. The sequence begins with each entity in a separate cluster. At each step, the two clusters that are closest to each other (indicated by smallest Sorenson coefficient) are merged and the number of clusters is reduced by one. Once these two clusters

have been merged, the resemblance coefficients between the newly formed cluster and the rest of the clusters are updated to reflect their closeness to the new cluster. An algorithm called UPGMA (Unweighted Pair-Group Method using Arithmetic Averages) [14] is used to find the average of the Sorenson coefficients when two clusters are merged in this clustering algorithm. Dendrograms are adopted to demonstrate the clustering process and the proximity between entities. It presents a hierarchical view of the legacy system, which consists of different levels of abstraction from source code to subsystems. In order to extract a functional component from legacy code, a cutting point must be figure out in the dendrogram. This cutting point affects the number and quality of the components derived from the dendrogram. It is an elementary factor for determining the granularity of extracted components. So far, the cutting point is determined with architects’ participation. Other existed and recovered design information facilitates this decision-making process. This clustering method together with human supervision restructures legacy systems into coarse-grained and loose-coupling components. Figure 3 shows an example of a dendrogram.

3.2 Selected Code Componentisation After the component candidates are figured out by clustering techniques, they need to be further understood. Although the selected component candidates are independent and loose coupled, it may still have some coupling relationships with other software entities. These coupling relationships need to be handled according to design decisions. Furthermore, there could be some dead code within the selection which affects the quality of target components and must be eliminated. In order to make the selected code segments to function independently, the parameters for the component interface must be determined. The achievement of these goals is based on deep source code comprehension which is realised by program slicing techniques. Program slicing techniques have been investigated for many years [16]. In this paper, we concentrate on static program slices. A slice is computed in the following three common steps: • Map the slicing criterion on a node. • Discover all backward reachable nodes. • Map the reached nodes back on the statements. The slicing algorithm has been defined as follows. A slicing criterion is a pair of , where p is a program point and V is the set of variables referenced at p. A program slice on the slicing criterion is a subset of program statements that preserves the behaviour of the original program at the program point p with respect to the program variables in V. A slice with respect to such a slicing

criterion consists of the set of nodes that directly or indirectly affect the computation of the variables in V at node p [7]. This form of slice has also been defined backward slice, in contrast to a forward slice, defined as the set of program statements and predicates affected by the computation of the value of a variable V at a program point p. By analysing the input program, a representation is created based on static dependences in the program and the program is instrumented for creating the necessary runtime information. The slicing criterion is identified with a vertex in the Program Dependence Graph (PDG) [10], and a slice corresponds to all PDG vertices from which the vertex under consideration can be reached. The statements and expressions of a program constitute the vertices of a PDG, and edges correspond to data dependence and control between statements. For one variable, a decomposition slice has been built, which is the union of certain slices taken at certain line numbers on the given variable. Then the other component of the decomposition will also be obtained from the original program. These complements are constructed in such a way that when certain statements of the decomposition slice are removed from the original program, the program that remains is the slice that corresponds to the complement of the given criteria, with respect to the variables defined in the program. Thus the complement is also a program slice. (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11)

float essential; float interest; float payment; float withdraw; interests = inrerest /100/12; payment = payment *12; withdraw=0; x = pow (1* interests, payment); monthly = (essential *x* interests ) / (x-1); withdraw= withdraw+i; if ((0GetFloatArrayElements(env,arr,0); float Loan_payments (float essential, float interests, float payment ) essential = body [0]; interests =body[1]; payment = body [2]; interests = inrerest /100/12; payment = payment *12; x = pow (1* interests, payment); monthly = (essential *x* interests ) / (x-1); if ((0 NewfloatArray (env, len*sizeof(float)+1); (*env)->SetFloatArrayRegion (env, sum, 0, 10 * sizeof (float), (jfloat *) jarray); (*env)-> ReleaseFloatArrayElements(env, arr, jarray, 0); }

Figure 5. Wrapping legacy code with JNI The proposed method to create JNI wrapper for reusable legacy code can be divided into three steps: • Define a new class that extend and overrides the original method. The Java system routers all drawing operations for a new object through the original method, as it does for all other GUI objects. • Generate a header file that describes the interface to the native method that Java expects to be used. Necessary corresponding between Java and other language files is created automatically to make native calls. • Write the native rendering methods with an interface that conforms to the generated header file, and build it as a standard shared library. Then the user can initiate and control all there actions through a simple GUI. Figure 5 shows an example of wrapping reusable legacy code of a credit card service in a bank system with JNI. The reusable legacy code in C language (Loan_payments function) is in charge of calculating the total repayment amount in a credit agreement when supplied with a number of parameters. The reference to the Loan_payments function is defined as an interface. It is treated as a class to specify an instance in the legacy code, and then the functionality of the Loan_payments function is embedded in the target application. The essential JNI glue code is illustrated.

A Grid Oriented Approach to Reusing Legacy Code in ICENI Framework

A Grid Oriented Approach to Reusing Legacy Code in ICENI Framework

Suggest Documents

Reusing Object-Oriented Designs - CiteSeerX

A Service-oriented Approach to Mobile Code Security

Using ICENI to run parameter sweep applications across multiple Grid ...

A Framework For Analysing The Effect Of 'Change'In Legacy Code

Examples of Reusing Synchronization Code in

A Performance Oriented Migration Framework For The Grid - ICL

A critical approach on reusing ontologies

GEMLCA: Grid execution management for legacy code ... - CiteSeerX

darwin's legacy: a comparative approach to the

A Code Generation Framework for Actor-Oriented ... - Semantic Scholar

A Code Generation Framework for Actor-Oriented ... - Semantic Scholar

A Code Generation Framework for Actor-Oriented ... - Semantic Scholar

A Service-Oriented Approach for the Pervasive Learning Grid

An Object-Oriented Approach To Generate Java Code From UML ...

An Object-Oriented Approach To Generate Java Code ... - CiteSeerX

An Object-Oriented Approach To Generate Java Code From UML

A sustainable approach for reusing treated ...

A PATTERN-ORIENTED APPROACH TO A

A Transdisciplinary Process Oriented Framework to ...

An Aspect-Oriented Approach to Framework Development - CiteSeerX

Clean Legacy Code

A PEER-TO-PEER APPROACH TO RESOURCE LOCATION IN GRID

ICENI: An Open Grid Service Architecture Implemented with Jini - LaBRI

Scheduling Architecture and Algorithms within the ICENI Grid ...