Scenarios for the Identification of Objects in Legacy ... - IEEE Xplore

13 downloads 0 Views 916KB Size Report
Amersfoort, The Netherlands fielte @dpfinance.nl. Abstract. In this article' wepropose an incremental approach to the identification of (business) objects in ...
Scenarios for the Identification of Objects in Legacy Systems The0 Wiggerts, Hans Bosma ID Research Groningenweg 6,2803 PV Gouda, The Netherlands T.A.Wiggerts @ idr.nl, [email protected]

Abstract

0

In this article’ wepropose an incremental approach to the identification of (business) objects in legacy applications. In this approach different object identification scenarios can be applied altemately. Three different strategies are presented: function driven, data driven and object driven object$cation. We discuss these scenarios and report on experiences gainedfrom applying them to a subsystem o f a real-life mortgage system. We also discuss related work.

Organizations operate in ever faster changing business environments. Globalization of markets, increased competition, and mergers and acquisitions force organizations to reevaluate their products and their sales, distribution, and marketing strategies. The information systems that organizations depend upon must also continuously change together with the organizations. The following lists some of these necessary changes:

0

supporting a shorter time to market process; supporting the ‘customer oriented organization’ instead of the ‘product oriented organization’; using corporate databases not only for administrative purposes, but also as a tool for strategic marketing and management information;

makes it a language with which the advantages of object

technology should be within reach, also for COBOL oriented software departments. A major feature of 00 COBOL is, that it is downward compatible. Therefore, COBOL programs written in older COBOL dialects (which comply with older standards)

make it easier to adapt to external influences (for example, the introduction of the Euro, the single European currency); ‘The authors were all sponsored by bank ABN AMRO, software house DPFinance, and the Dutch Ministry of Economical Affairs via the Senter Project #ITU95017 ‘SOS Resolver’.

2To those readers who do not have a general feeling of what object orientation is about, we recommend reading one of these books: [17],[I],

WI.

24 0-8186-8162-4/97 $10.00 0 1997 IEEE

take advantage of new opportunities by the introduction of new technology like call centers and the Internet.

The ease with which such changes can be carried through depends heavily on the system CharacteristicsJEexibility,distribution and integration. The current mainframe information systems do not score well on these characteristics. Because of the size and complexity of the systems, the flexibility is low and the mainframe infrastructure is, putting it mildly, not renown for its openness. The technology that promises to assist in achieving better results for these characteristics is ‘object oriented’ (00) technology. A discussion of the advantages of object orientation or what is meant by object orientation is beyond the scope of this article’. For now it suffices to say that 00 enables a more ‘natural’ division of responsibilities over objects that stand closer to the real world than the data and functional elements found in traditional systems development. The objects which represent the ‘things’ people encounter in their day-to-day work are often called ‘business objects’. There has been an important impetus to the introduction of object orientation in the mainframe world: the introduction of the new standard of COBOL that is known as Object Oriented COBOL (see e.g. [ 121 for the MicroFocus implementation). The object oriented features of 00 COBOL are impressive: class, object, messages, dynamic binding, inheritance (and even multiple inheritance) are introduced into the language and garbage collection is prescribed. This

1. Introduction

0

Erwin Fielt DPFinance De Brand 10,3823 LH Amersfoort, The Netherlands fielte @dpfinance.nl

should also compile and run with a new 00 COBOL compiler. This gives hope that legacy code can be reused in object oriented (re)developments in the future. Furthermore the learning curve for COBOL programmers is expected to be less steep as they do not face a new paradigm and a new language. Because COBOL is by far the most used language in mainframe environments, this introduction determined the theme of our research: Identifying objects in legacy COBOL applications to support the migration from COBOL to Object Oriented COBOL. Most of the results will apply to other procedural languages as well, however.

system is the only source of information for requirements rediscovery. Experience shows that the available documentation is often not up to date or inaccessible due to its technical nature. In any case the actual running system is the only sure source of truth3. So it must be analyzed to gain an understanding of what it is actually doing (the reverse engineering step). If the analysis result is presented in terms of business objects, a new object oriented design can be made. Figure 1 illustrates this ‘object oriented reengineering’. On the left the old system appears. Rectangles depict data structures (this could be record structures or databases etc.) and ovals represent functional units (like procedures, functions, programs etc.) which operate on the data structures. In the middle the requirements are visualized in terms of business objects and their interrelationships. Finally, on the right we see the object oriented re-implementation consisting of a set of cooperating objects (the cell- or lifebuoy-like symbol is often used to visualize objects). The reverse engineering step is seriously complicated by the fact that legacy applications have evolved over many years and are blurred by all sorts of technical design decisions. Because of this, their structure has often become very interwoven and impenetrable. Requirements are hidden by cascades of design decisions and maintenance operations. Therefore, even cold turkey reverse engineering may very well be impracticable. To overcome this problem we propose to benefit from the hybrid nature of 00 COBOL (and other hybrid languages) by extending the incremental line of thought with an incremental approach to object (class) identification in legacy applications. Figure 2 illustrates this idea. In the original application some objects are identified by applying one of the scenarios discussed in section 3. The objects are implemented and the code, which was previously used to implement the functionality provided by these objects, is discarded. Other necessary adaptations such as adding the appropriate method invocations must be made as well. This results in a hybrid application. During several iterations other objects are identified and implemented in the intermediate hybrid application in the same way. Finally an object oriented counterpart of the original application has evolved. The approach does also provide the possibility not to go ‘all the way’ and to implement a specific intermediate result which satisfies the current need for flexibility. Note that hybrid applications need a hybrid environment: an environment that supports the procedural as well as the object oriented paradigm. Creating such a hybrid environment and migrating the original application into this environment, is beyond the scope of this article. The interested

Outline In this article we discuss our research on the identification of objects in existing procedural systems. This research is part of the larger project ‘Resolver’ which is conducted by the Centre of Mathematics and Computer Science (CWI), the University of Amsterdam and ID Research. The goal of the Resolver project is to contribute to the solution of renovation problems by increasing and building up knowledge which can be employed in actual system renovation projects. In this article we explore different scenarios for object identification which fit in a bigger picture and can be applied alternately. Experiments on a subsystem of a mortgage application were performed to test the applicability of these scenarios. The outline of this article is as follows: in section 2 an overall strategy for object oriented reengineering is presented. It is argued that an incremental approach to business object identification is often preferable. In section 3 we discuss different scenarios which fit into such incremental identification. In section 4 we briefly discuss related work, and some results of initial experiments which we did with the scenarios are presented in section 5. At the end of this article we present conclusions and directions for further research.

2. Incremental 00 Reengineering When migrating to object orientation one would ideally completely rebuild the original application in an object oriented fashion. In [3] it is argued that such a ‘cold turkey’ migration is in most cases not a serious option because of issues concerning risk, time and resources. We argue therefore that a ‘chicken little’ or incremental approach should be taken when renovating software. In order to re-implement an application (using another paradigm) we must first rediscover the requirements the system fulfills. Following the terminology presented in [4] this means that a reverse engineering process is followed by a forward engineering process. Or to put it in other words, in order to re-implement a system in an object oriented way it must be reengineered. Often the old

reader is referred to literature in the area of bridging tech31n this article we assume that the source code of this version of the system is available. Note that in practice this need not always be the case!

25

Figure 1. Object oriented reengineering: first an (object oriented) analysis is done, then an object oriented design is made

\

U Figure 2. incremental migration to object orientation

3.1. Function driven objectification

nology (e.g. PI, [91>. The main goal remains identification of business objects. In the sequel of this article we discuss different scenarios for the identification of objects in legacy applications. One of these scenarios is applied in each iteration of the approach.

In this scenario we select one functional part (a collection of programs or a subsystem) as our starting point. The functionality of the selected part is expressed in terms of objects. For the identification of these objects at this lower level of granularity one can again consider applying one 00 reengineering step or choose to apply (a combination of) the object identification scenarios discussed in this section. In traditional systems development the structure of a system is the result of functional decomposition and technical considerations. Selecting a part from the existing structure for object identification will therefore result in objects which can be expected to be process-like and technical rather than business-like. However, identifying such objects can be very useful for hiding technical details and thus increase the application’s flexibility. An implementation technique which follows the function driven scenario and is often used in practice is called wrupping. This technique is not used to identify objects but to implement the identified objects and reuse existing code from the selected application part by ‘wrapping’ it. This wrapper consists of the objects we have identified. Other parts of the previous intermediate application will communicate with the wrapped code through the objects in the wrapper. This implies that calling wrapped procedures (or functions, routines, programs etc.) and directly accessing data within the wrapper is not allowed from outside the wrapper. All such communication should be replaced by

3. Object identification scenarios We have distinguished three scenarios, each of which adopts another starting point for replacing an application part by objects. The three scenarios have been calledfunction driven objectification, data driven objectiJcation and object driven objectification. These scenarios can be characterized as follows. The function driven objectification takes a functional part of an application as starting point and expresses that functionality in terms of objects. The data driven scenario strives for the identification of objects by analyzing data structures in the application, followed by the methodification of procedural code. The object driven objectification uses domain knowledge to identify useful objects and then analyzes the application to trace the application code that must be replaced by the identified object. In the overall strategy presented in the previous section, these scenarios can be applied alternately. In the following paragraphs each scenario is discussed in a programming language independent manner.

26

lar data structure is often spread widely throughout a legacy system. However, the objects identified in this scenario can be expected to be more ‘business-like’, because the underlying data model often follows the domain more closely than the functional structure which is often more influenced by technical design decisions. This is one of the main reasons why an application’s data model is often found to be more stable than its functional structure. When applying this scenario, decisions have to be made as to what to do with a procedure of which different parts work on different data structures. In each such case the question to be answered is “Which structure (object) should be responsible for this functionality?”. Should the whole procedure be assigned to one main structure or should it be split up and should new methods be created out of fragments of different procedures? On the other hand, if a fragment works on two or more data structures and it can neither be split up nor be assigned to one data structure in a satisfactory way, may it be assigned to more than one of the data structures involved, thereby introducing redundancy? Or should we consider merging the two data structures into one object? Figure 4 shows how this scenario affects the original application. Each data structure is placed in an object (lifebuoy). The procedures are placed in the slots of the lifebuoy. Each slot represents a method. So the original procedures become methods of the data objects they accessed. The dashed oval illustrates that there are several possible choices when assigning procedures which access multiple data structures.

calls to methods of the objects in the wrapper. The old code is reused by letting the methods call the proper procedures and access the proper data in the wrapped code. One could ask what is gained by introducing an indirection step. The benefits are that from the outside the wrapped application part looks and behaves like an object oriented application part. By hiding the implementation of the functionality (the original code), changes to this code can be made without affecting any code outside the wrapper. The application has become more modular (one of the benefits claimed by object orientation).

Figure 3. Wrapping: a wrapper of objects is built around the selected application part Figure 3 illustrates wrapping. The selected part of the application is placed inside the wrapper which consists of objects. Direct communication (the dashed arrow) from an external part to the wrapped part is forbidden. The methods of the wrapper objects can be implemented by calling procedures and accessing data in the original code.

3.2. Data driven objectification

Figure 4. Data driven objectification: the data structures in the original application serve as starting points for objects

Objects are often defined as data with functions working on that data. Following this view, this scenario uses data structures in the legacy applications as starting points for objects. So rather than existing functional parts, the parts selected by this scenario are data structures. These data structures can be variables, record structures, tables, files or whole databases. Procedures, functions, programs etc. which work on the data structures become methods of the corresponding objects. In terms of responsibilities this means that the application which was responsible for all of the functionality, delegates responsibilities to the proper data structures. This scenario asks for a more rigorous restructuring than the function driven scenario, because the use of a particu-

3.3. Object driven objectification Unlike the previous scenarios, this scenario is not driven by technology (the existing code). It makes extensive use of existing and reengineered knowledge of the application domain. We argued that identifying objects for an entire system in one pass is often not practicable. However, the idea can be applied incrementally as well. By performing a low intensity analysis (on the basis of

27

expert knowledge, system specifications and the legacy application) objects can be identified that will replace parts of the code of the application. For some business objects common sense dictates that they must be represented in some form in a particular system. In a mortgage application there must be mortgages and clients, and in a car registration system there must be cars. Tracing which part(s) of the existing application must be selected to be replaced by the identified objects is obviously not an easy task. The exact functionality associated with these objects has to be reverse engineered from the legacy application. The identified object(s) can be implemented and added to the application. All references to data structures and functionality associated with the object are changed to method invocations. Depending on the application structure, code may be reused as well. It is obvious that this scenario is the least mechanic and thus the least suited for automation.

statements. The program is now wrapped by a single object. Figure 5 illustrates this.

Figure 5. A CASCed program This process allows the former procedures of a program which were not visible to the outside world to be invoked by other programs. Note that this can be quite dangerous when such a procedure sets variables etc. A program which has been turned into a class can also be reused by inheritance. A subclass of the class of the program only needs to alter some methods to achieve slightly different behavior. Transforming program sections into methods requires that the parameters of each section are resolved. This is necessary as the procedures from which they originate don’t have parameters. Newcomb and Markosian [IO] use data flow analysis and knowledge of COBOL programming practice for the equivalent problem of transforming perform statements into call statements (which also need parameters).

4. Related work In this section we briefly discuss related work. We discuss the tools available and methods and techniques presented in the literature on objectification and on migrating COBOL applications in particular. Because of the infancy of the field, most tools are research prototypes or general reengineering and program understanding tools. The approaches known are classified according to the scenarios presented in section 3 .

4.1. Function driven objectification

4.2. Data driven objectification Grotehen [8] discusses the use of business objects for wrapping. An object request broker is used to manage the communication between the objects. Each business object is specified to the request broker in CORBA-IDL and mapped to a legacy application. In a Systems Techniques’ white paper [ 161 the following basic categories of wrapping products are distinguished: screen scrapers, legacy code wrappers and bridges, and database access tools. The white paper briefly discusses some tools in each category. Sneed [ 141 describes means to encapsulate existing Assembler and COBOL programs on different granularity levels. These levels are: job level, transaction level, program level, module level and procedure level. The Micro Focus Object COBOL workbench [12] offers the ‘COBOL as a Class’ option to turn entire programs into objects. For a particular program a class is created whose methods are the sections of the program. The data structures, paragraphs and their interrelationships are not changed. PERFORM statements are transformed to INVOKE

REDO has been an ESPRIT project on reengineering [22]. In this project an approach has been developed to transform COBOL programs into Z++ specifications via a meta-language called UNIFORM (Z++ is an object oriented extension of the formal specification language Z). The approach works on autonomous batch programs, i.e. programs which do not access databases or other data sources. A tool supported process for extracting objects from existing COBOL programs is described by Sneed [15]. The process is based o n human interaction to select objects, coupled with automated slicing techniques to identify all of the elementary operations which change the state of the object selected. The object is redefined within the framework of an 00 COBOL class and the elementary operations are attached to it as methods. The result is a set of 00 COBOL classes. MOORE [7] stands for “Methods for Object Oriented ReEngineering”. It is a research project at Germany’s national research center for information technology GMD. In

28

the project a prototype tool has been developed which also goes by the name of MOORE. The tool supports the user by making transformation proposals which are presented in weighted lists. Data structures serve as candidates for objects. A class library is searched to find out if a suitable class already exists for a data structure. For the comparison of data structures the internal coded representations are used. Each newly created class is added to this library. The user has to provide a name for the new class and specify the super classes. MOORE also does proposals for code segments which check or alter attributes to become methods. Data flow analysis is used to determine ‘local variables’ of program segments when viewed as subroutines. Sequences of statements with a small number of local variables are candidates for methods. The parameters of these methods also follow from the data flow analysis. Transformations are implemented as tree manipulations and the classes are managed by a repository. The tool of the Software Revolution company is based on Software Refinery from Reasoning Systems [IO]. It takes sources of COBOL programs as input and produces an abstract object oriented model. The tool works on abstract syntax trees. Data flow analysis and several other analyses are supported. Data structures with the same physical definition are said to be the same class even if they have totally different semantics. There are different classes for different programming concepts. There is a data object class, a program object class, a procedure object class and there are several method classes. State-transition tables are created to depict procedure control behavior. The abstract object oriented model is normalized. Alias analysis is used to accomplish instance-class generalization, substructures are compared to achieve subclass generalization. Methods are normalized by promoting constants within methods to parameters. Methods which have become identical are merged. Useful for this scenario are methods and techniques developed for data reverse engineering. Many papers have been published on this topic for more than a decade, and several CASE tools dedicated to data structure recovery are available. Particularly interesting is the research carried out by Yang and Bennett [21], who describe an approach for the acquisition of Entity Relationship Attribute (ERA) diagrams from data-intensive COBOL code. Their method is based on formal transformations using information extracted from data structures and imperative code.

mention that it is done by using both the legacy code and other sources of information regarding the system as well as guidance from experts of the system. The system is redesigned using object oriented forward engineering. It is possible to reengineer only a part of a system. This part is specified by selecting a set of analysis objects. The part of the implementation to which these objects were mapped is the part to be reengineered. The interface with the remaining part of the system consists of those objects which are related to the object in the selected set but are no members of this set themselves. In [6] it is proposed to create general executable domain models which are parametrized for the legacy system at hand. Object oriented frameworks are used to implement the executable domain models. The framework is instantiated during the legacy system comprehension process. Once there is a domain model in which the whole system is expressed, this model can be used to support the evolution of the system. COBOL/SRE stands for COBOL System Renovation Environment [ 111. It consists of a set of tools which work on a knowledge base containing information about a legacy system. The environment creates a virtual data model. This model is based on analysis of data record mappings, data assignments and data flows through files. An aspect of particular interest here is what is termed Reusable Component Recovery. This is an automated technique which is aimed at recognizing functional patterns in the legacy code. These need not be mapped to consecutive pieces of code. They are segments (sets of program statements) of semantically related pieces of code. Concepts to be recognized are described in plans. The user is supported while isolating semantically related segments (focusing) and while packaging these segments into separate modules (factoring). Five focusing operations based on program slicing are provided. Operations working on segments like union, intersection and difference are provided as well. However, the problem of combining both data and procedural statements into objects still has to be solved. The Fusion Method for object-oriented development of systems is adapted for doing reverse engineering of legacy code: F u s i o n m [ 131. The objective of Fusion/RE is to produce a system analysis model based on Fusion, i.e. an object model, a life cycle model and an operations model. This is done in three steps: revitalizing the system architecture of the current system and then creating an analysis model of the current system that is generalized to create an abstract analysis model. Heuristics are provided to assign procedures to classes.

4.3. Object driven objectification Jacobson and Lindstrom [9] describe an 00 reengineering approach which has the development of an object oriented analysis model as a first step. The objects in this model are mapped to the implementation of the legacy system. This is a far from trivial process and the authors only

29

5. Experiments

paragraph can be seen as a ‘method’; it has only relevance in the context of the program that it is a part of.

In this section we briefly report on the experiences gained by performing initial experiments on a real-life system: the relation administration subsystem of a mortgage system. This is a mainframe system which is over ten years old (and is based on a system which is even older) and has evolved since it was first released. It is written in VS COBOL I1 (Ansi’74 compliant) and uses the CICS transaction processor and VSAM files and makes extensive use of copybooks. These design decisions have had a major impact on the subsystem’s structure. The relation administration subsystem comprises approximately 100,000SLOC using some 1,000 copy books. For our experiments we used the Micro Focus COBOL Workbench which supports both 00 COBOL and older dialects. This required migrating the mortgage system to the PC-based Workbench environment, followed by the recompilation of the sources.Although the operation was rather complex it did not cause major difficulties.

5.2. Data driven experiments

We did experiments with the COBOL As a Class option provided by the workbench (see 4.1). Alas we ran into environmental problems which are probably due to the fact that CASC is an early release feature in the Workbench. We considered two programs which have the function to make an extraction of a street and a city respectively. An extraction is defined as the first four or five characters of the longest word in the string that make up a city or street name. These two programs are very much alike. The only differences are that the resulting street name extraction has four characters while the extraction of the city name has five characters, and one program uses ‘city’ in its data item names, the other ‘street name’. So, the idea was obvious: use the subclassing option of CASC to reuse the identical code. In the experiment we wanted to create a ‘superprogram’ (class) with the similar code and make the two original programs (classes) inherit its behavior. But a small detail prevented this from succeeding. The COBOL as a Class subclassing option requires that the subclass has a conforming

To perform a data driven experiment we looked for data which could be selected in order to identify objects. In the subsystem at hand there proved to be one major file which contained most of the business data. This contains all data on relations (people or corporate bodies which have any connection with the mortgage department). Nearly all programs in the subsystem have something to do with this file. So simply taking this file as the base for an object class relation and turning all programs that work on the file into methods of this object class would not improve the structure of the subsystem. A lower level of granularity is required. The relation record has to be divided over several smaller classes. A way to split up the relation record would be to use some variant of normalization as is done for relational database systems (see e.g. [SI). But in order to use normalization, knowledge of functional dependencies between data attributes is required. This knowledge could not be found in the documentation (and if it would be described there, it might not be valid anymore). The maintainers of the system did not have a data model, which contained information on these dependencies, at their disposal either. Other heuristics were needed to create objects out of the data in the relationrecord. Records in COBOL have a hierarchical structure, the highest level record is the 01 record, lower level records are indicated by higher numbers. Unfortunately, the record definition does not indicate much structure. There are two parts in the 01 record relation, the 03 level records. Within each of these parts there is no further hierarchic subdivision, they both have a flat structure. An OCCURS clause indicated that a relation-record can contain multiple occurrences of the second 03 record depending on the value of a specific field. This suggested that the second 03 record could be promoted and become a separate object. General domain knowledge confirmed our hypothesis as we had just split off the class address.The record description did not provide more clues for a further subdivision of the two classes. In-

linkage section with the superclass: the linkage section can

formation on which attributes of the classes relation and ad-

not be overridden in a subclass. The two original programs have a different linkage section, because of the differences in the length of the resulting extraction. This means that they can not both inherit from one superclass. We furthermore observed that sections and paragraphs in the HYPOS system can not be seen in isolation: CASCing a HYPOS source did not yield useful subroutines. We suspect that, in general, this observation holds. Procedural programs are not designed in such a way that an individual section or

dress could be promoted to become separate objects must thus be extracted from the programs operating on the data. We employ the hypothesis that whenever only a subset of the attributes of an object is operated upon by a program or a section within a program, this subset is a candidate for becoming a separate object associated to the former object, and the program logic operating on the subset is a candidate for becoming a method of this object. This is easier said than done however. The lack of a pa-

5.1. Function driven experiments

30

rameter passing mechanism in COBOL results in a lot of data being copied around in the application. Complex data flow analysis techniques are necessary to track a particular variable use to its source in the relations-record. Data flow analysis is also required to find out which fields in a record are actually used when the entire record is ‘passed’. Furthermore, programs are used to implement more than one function resulting in a rather complex control flow graph which can not easily be split up in several routines which can be turned into methods. To automate these techniques, further research is carried out within the Resolver project (section 6). However, we have gained some promising results by dealing with these problems manually. We found for example that the fields name,initials,prefixes and titles-codewere used together by some programs. And indeed the combination of these fields forms a business object which we called full-name.Another program is called to look up a string in a table depending on the value of titles-code.This suggested that titles should be turned into a class of its own having a method which can be named something like Present-as-string.So we have a class titles being part of the aggregate full-namewhich in its turn is a part of the aggregate relation.

In this limited scope, a relation has two states: a ‘normal’ state in which he has no known plans to move, and a ‘will move’ state in which he has such plans. Figure 6 illustrates this situation. In!mt-fut(lee-address

n /-

x

Input-future-address

n

- [Y] (JT-]’ Glve-address

U

Give-address [movxng-day passedl

U

Give-address [moving-day to come1

Figure 6. state diagram for relation When a relation object is in its normal state and an object asks for its address by calling the method4 Give-address, it gives its address and remains in normal state (a transition to the same state). When a message reaches the company that a relation will move (either by mail or via internet or whatever), this address is provided to the relation object via the method Input-future-address.This will cause the object to make a transition to the will move state. If it is now asked to give its address, the result depends on the moving-day. When the relation has not yet moved he remains in the ‘will move’ state and the current address is returned. When a relation has moved, the current address is changed to the future address before it is returned, and the object makes a transition to the normal state. For the sake of completeness we allow a relation which already told us he would move, to move to yet another address (he might have misspelled it or bought another house instead). This results in an arrow from the will move state back to the will move state labeled with Input-future-address. Note that the moving of relations is now totally hidden from the rest of the application. Any object can ask a relation object to give its address without ever having to know that a relation can do such a thing as move, let alone that an operator has to remember this every night.

5.3. Object driven experiments We profited from the knowledge we gained from our previous experiments and from knowledge gained by talking to programmers who were involved in the original implementation or in the current maintenance of the system. From this knowledge we created some object oriented models using the Unified Modeling Language [ 2 ] . In the following we restrict our scope to a particular aspect of the system, the moving to another living address by relations. In the original subsystem the new address of a relation together with the moving-day are recorded in the file Relations. A batch job goes through this file every night and changes the current address to the future address for every relation whose moving-day has come. This moving functionality should be assigned to one or more objects. We could turn the batch job into an object that asks every relation every night if he will move but we do not find this a satisfactory solution because there is no object in the real world which expresses this behavior. Thus the design question is who (which object class) should be responsible for the moving of a relation. In our view this is the responsibility of the relation itself. We ignore all other behavior of relations for a while and suppose that a relation has two methods (in pseudo-code):

6. Conclusions and directions for further research In this article we have proposed an incremental approach towards the identification (and optionally implementation in a hybrid architecture) of objects in legacy systems. We have discussed different scenarios which can be used alternately in the incremental steps.

Class Relation methods Input-future-address(address) Give-address: returns Address

4 0 r passing

31

the message depending on the terminology used.

gan Kaufmann Series in Data Management Systems, Jim Gray Series Editor. Morgan-Kaufmann Publishers, San Francisco, CA, Apr. 1995. E. J. Chikofsky and J. H. Cross. Reverse Engineering and Design Recovery: a Taxonomy. IEEE Software, 7( 1):13-17, 1990. C. J. Date. An Introduction to Database Systems, volume 1. The Systems Programming Systems. Addison-Wesley Publishing Company, 1986. J. DeBaud. Using Executable Domain Models to Implement Legacy Software Re-Engineering, 1996. position paper. H. Fergen, P. Reichelt, and K. P. Schmidt. Bringing Objects into COBOL, MOORE - A tool for migration from COBOL85 to object-oriented COBOL. In Proceedings of the Conference on Technology of Object-Oriented Languages and Systems (TOOLS 14), pages 435-448. Prentice-Hall, Aug. 1994. T. Grotehen and R. Schwarb. Implementing Business Objects: CORBA interfaces for legacy systems, 1996. I. Jacobson and F. Lindstrom. Re-engineering of old systems to an object-oriented architecture. In OOPSLA 91 Conference Procedings, pages 34C350, 1992. P. Newcomb and G. Kotik. Reengineering Procedural Into Object-Oriented Systems. In Proceedings of the 2nd Working Conference on Reverse Engineering, pages 237-249, July 1995. J. Ning, A. Engberts, and W. Kozaczynski. Recovering Reusable Components from Legacy Systems. In [Is],pages 64-72, 1993. R. Obin. Object Orientation - An Introduction for COBOL Programmers. Micro Focus Publishing, Palo Alto, 1995. 2nd edition. R. D. Penteado, E S . R. Germano, and P. C. Masiero. An Overall Process Based on Fusion to Reverse Engineer Legacy Code. In [Is],pages 179-188, 1996. H. Sneed. Encapsulating Legacy Software for Use in CIienVServer Systems. In [19], pages 104-1 19, 1996. H. Sneed. Object-Oriented COBOL Recycling. In [Z9], pages 169-178, 1996. Systems Techniques, Inc. Wrapping Legacy Systems for Reuse - Repackaging vs. Rebuilding, 1996. a White Paper. http://systecinc.com/white/whitewp. htm. D. Taylor. Object-oriented technology: a managers guide. Servio, Alameda, 1990. R. C. Waters and E. J. Chikofsky, editors. Proceedings ofthe Working Conference on Reverse Engineering. IEEE Computer Society Press, 1993. L. Wills, I. Baxter, and E. Chikofsky, editors. Proceedings of the Third working Conference on Reverse Engineering. IEEE ComDuter Society Press, 1996. [20] R. Wirfs-Brock, B. Wilkerson, and L. Wiener. Designing objecr oriented software. Prentice-Hall, Englewood Cliffs, New Jersey, 1990. [21] H. Yang and K. Bennett. Acquisition of ERA Models from Data Intensive Code. In Proceedings ofthe IEEE International Conference on Sofmare Maintenance, pages 116123. IEEE Society Press, Oct. 1995. [22] H. Zuylen van, editor. The REDO compendium: reverse engineering for software maintenance. Wiley, 1993.

The function driven scenario leaves the existing application structure intact but presents the functionality of the selected part in terms of objects to the rest of the application. Therefore this scenario is very well suited for hiding technical details. The identified objects will be very technical as well, though. The data driven scenario results in objects which are more like ‘business objects’ because the data model is more stable and domain oriented than the functional application structure. However, this data model may have to be reverse engineered first when it is has been flattened for implementation reasons as was the case in the subject system of our experiments. Our experiments showed that doing this automatically requires sophisticated data flow analysis and other techniques. Further research and experimental work needs to be done in this area. The object driven scenario requires a thorough understanding of both (part of) the domain model and of the original application. As human creativity is involved here, this process can never be fully automated and thus does not scale up very well. However, the objects identified in this scenario will be the most business-like. In our experiments we only identified objects in one subsystem of the mortgage system. The behavior of the objects may cross subsystem boundaries because traditional modularization criteria differ from object oriented ones. This problem may occur at any level of granularity with every function driven step. We encountered this phenomenon in our experiments as well. Future research must find ways to handle such situations and prevent that several objects are identified in different subsystems which represent the same business object. Encouraged by the results of the research presented here, in the Resolver project we will further investigate object identification by grouping data fields based on their mutual usage (the data driven scenario). We plan to experiment with cluster algorithms for this purpose. An object oriented redesign will be made to guide the tuning of the cluster algorithms used to suit our purpose (the object driven scenario). Acknowledgements The authors want to thank Paul Klint (CWIAJniversity of Amsterdam), Ryan Smits, Mark Vermeulen and Wim Mekkring (all DPFinance), and Jan-Willem Hubbers (ID Research) for their support.

References [ 11 G. Booch. Object-Oriented Design with Applications. Ben-

jamin Cummings, Redwood City, California, 1991. [2] G . Booch, I. Jacobson, and J. Rumbaugh. The Unified Modeling Language for Object-Oriented Development, 1996. documentation set. http://www.rational.com/ot/uml.html. [3] M. Brodie and M. Stonebraker. Migrating Legacy Systems: Gateways, Interfaces, and the Incremental Approach. Mor-

32