ss Robertson lo

Implementing the new without compromising the old.

Integrating Legacy Systems with

Modern Corporate Applications Paul Robertson

THE RATE OF INTRODUCTION OF NEW TECHNOLOGIES IN THE INFORmation technology realm has grown enough to become a problem in itself. IS departments are struggling to maintain personnel capable of managing the onslaught of new technologies while simultaneously providing timely applications in response to demands for greater access to corporate data for both internal and customer purposes. Some departments have been addressing these problems by employing technologies developed for artificial intelligence applications where similar problems exist—namely, that project goals and definitions change rapidly, and that complex data systems that are themselves in continual flux must be tracked. Such technologies include dynamic object-oriented programming, reflection, and domain-specific programming languages. More than ever, programs that are built today are fluid in their design and yet the technologies that are commonly used to address them remain rooted in a static

view of the program development cycle. Knowing that modern applications are expected to have a long life, but must also survive in an increasingly fluid environment with frequent design changes leads us to ask the question: How should these applications be designed and implemented? The example that we provide here illustrates how many of these issues can be effectively addressed by taking a dynamic COMMUNICATIONS OF THE ACM May 1997/Vol. 40, No. 5

39

The crucial issue

with legacy systems is that

approach to they are generally wired into the running of a complex system design. The business in a very substantial way. effectiveness of three related technologies are demonstrated: dynamic OOP, domain-specific embed- these technologies have been used in one such project ded languages, and reflection. We contrast this by Kaiser Permanente, a health care provider. approach to the continued use of static languages (such Understanding the relevance of any technology in as Java) for addressing these issues. solving information processing problems demands a Dynamic OOP allows aspects of an application clear understanding of the problem domain to which design as far as it is represented by objects (usually it is being applied. It is often the nature of the probthrough the use of classes), to be modified dynami- lem domain that determines the success of applying cally—even while the program is running. This abil- particular technologies. ity has far-reaching consequences that range from This article describes parts of a collection of related modifying running programs without having to stop projects in which dynamic OOP and reflection were them, to a style of application design that is viewed as used to build a bridge between existing legacy systems dynamic. and modern client-server, database and intranet demands. We believe this experience is typical of issues REQUENTLY IN SOLVING COMPLEX PROBLEMS, IT that are being faced in many different companies in is useful to simplify the problem by first many different business areas, as the drive to provide designing a language specific to the problem greater data access through intranets, client-server, and domain that allows the problem to be help-desk applications forces legacy data to be recast addressed in native terms. The representation and reused. The particular example used in this article of the solution is then much more straightforward. was developed by a health care provider, but similar This approach of building domain-specific languages issues pertain to a wide variety of businesses. is a powerful idea that has been prevalent in some circles for decades, but has rarely been employed in main- Background stream computing because the languages commonly Discussions about legacy systems and data often cenemployed there lack the extensional capabilities ter around the maintenance of old code and making needed to build domain-specific languages easily. an existing system survive an upgrade of hardware, Brian Smith [7] introduced Procedural Reflection operating system, or database vendor. In this article as a way of designing a programming language that we are not interested in the problems of maintaining could manipulate its own semantics. CLOS made some a legacy system, but in what the role of a legacy sysof these ideas practical in Common Lisp [8] by pro- tem is and how it relates to modern software viding support for meta object-level programming demands. Typically, a legacy system is an in-place [4]. Embedding meta-level semantics within a pro- structure that is neither optimal for modern needs gram allows programs to be written that attend to nor modifiable for project purposes. In dealing with their implementation in the execution of their tasks, legacy systems, it is important to understand the and thereby cope with certain changes that occur over forces that govern their existence. time without explicit reprogramming. By allowing a Legacy systems for the most part were developed by clear separation between meta-level semantics, and a previous generation of developer—hence the term program-level code, the intent of the program code can legacy. That the code is often voluminous and hard to be rendered in a more straightforward manner, making modify is of little interest to us in this instance. The the code easier to manage. These are important con- crucial issue with legacy systems is that they are gensiderations for evolving systems. erally wired into the running of a business in a very These ideas, along with dynamic programming substantial way. The security of the legacy systems is of techniques, are currently helping some IS departments utmost importance. meet the challenge that the fluid technology environIn the case of a medical service provider, the legacy ment of the 1990s presents. This article describes how systems in place govern data such as membership, sub-

F

40

May 1997/Vol. 40, No. 5 COMMUNICATIONS OF THE ACM

scription information, pharmacy, drugs, appointments and encounters, and billing. The systems that feed this data come from a variety of sources including online connections to pharmacies, and data input from forms completed in doctors’ offices by doctors and patients. The accuracy of the data is required for a wide variety of reasons such as: accurate billing of patients; accurate billing of the government, where appropriate, for certain classes of patients; accurate payments to service providers including consultants, physicians, and pharmacies; and establishment of appropriate member status when a patient enters a clinic. In short, the whole business rests on the in-place system. A significant amount of computer resources and human support personnel is utilized in keeping this system of information flow running smoothly. The systems described are undergoing continual change and are not the fixed, rigid systems that are sometimes described. The total system is usually quite large compared to the part that is being changed, and the change can be compared to the movement of a glacier. The legacy data described here have a great many beneficial uses beyond the strict running of the business that it is designed to perform. The kinds of constraints on these other demands are of particular interest to this author. The increase in computing resources and technologies capable of providing access to data drives demand for greater access, and this is where the old and new come together. The integrity of the old cannot be compromised in the implementation of the new. There are many areas in which legacy data can be put to use. Some examples follow: • Uses in marketing. In the case of a health care provider engaged in competitive practices, there is a need to demonstrate value to prospective customers both at the corporate level and the individual level. Legacy data contains information from which much useful data can be extracted including effectiveness of treatment and effectiveness of certain programs, such as preventative health plans. The programs offered can be tailored to promote cost-effective health by stressing preventative medicine, exercise, and wellness programs. • Uses for executives. While the primary use of a legacy system is the actual running of a business, an executive requires accurate information about how the system is running. This includes things like categorizing patients and analyzing costs by patient category, and utilization rates of care facilities. Better information can lead to a more efficiently run operation. • Uses in government reporting. Government reporting is an issue that changes over time depending on

the funding of government programs such as Medicare, and legislation to manage health care costs. Sometimes government reporting requirements will require the legacy system to be extended to incorporate additional data, but often the data already exists and new reports may need to be generated from the existing legacy system. • Uses in competitive analysis. In an era in which health care costs are continually being scrutinized, there is an increasing need to show cost-effectiveness and to provide data that compare costs between different health care providers. This category straddles that of government reporting and marketing. • New access to data. Client-server and intranet projects are now allowing greater access to data by a wider range of personnel than legacy systems were designed to accommodate. This includes physicians, pharmacists, sales and marketing personnel, and—to a limited extent—health plan customers. It is clear that the legacy data represents a wealth of information that can be used in promoting a company’s products, in running a business efficiently, and in providing competitive services. The effectiveness of preventative health programs, such as exercise, or early testing, such as mammograms, can be measured. Safety of treatments can be managed more effectively by accessing historical data. The value of legacy data is apparent, that the pressure to take advantage of the wealth of data is extremely high, and the effectiveness of an information technology group can be measured by the speed at which the opportunities are realized. Fundamental Concepts The system developed at Kaiser Permanente is designed around the idea that data residing in a variety of diverse databases represents a collection of objects that describe various parts of the operation of a business, and that by combining these objects in a coherent way and viewing them as a common collection, the complexities surrounding the modern usage of the data can be reduced. Database Distribution, Replication, and Updating Several factors suggest an architecture for client-server and intranet solutions in a legacy data environment: • The legacy system is generally not allowed to be modified in support of these new applications (because to do so would jeopardize the robustness of the information processing backbone that runs the entire business). COMMUNICATIONS OF THE ACM May 1997/Vol. 40, No. 5

41

• The data often resides not in a single database in a single format, but is distributed over a number of databases, often from different vendors, running on different machines, often with significant physical distances between the separate services. There is often a need to use data from a variety of sources for report generation, construction of management information systems, or the creation of clientserver or intranet based applications. Such integration efforts are often based on an architecture which dedicates one or more systems to serve as repositories of extracted legacy data as integrated objects that reside in an Integrated Object Database (IODB). This extracted legacy data is recast as integrated objects. While legacy systems evolve at a very slow rate, the IODB and the applications that utilize it evolve at a very high rate. Any system that caches data must deal with issues of coherency, and when data is combined from multiple sources, synchronization issues must be addressed. Usually these issues require a detailed understanding of the rules of the particular business for which the system is being designed. These issues are complex, and are beyond the scope of this article. Dynamic OOP By treating the legacy data as objects whose definitions are immutable, and by treating the IODB as a dynamic object system whose definition is fluid, allows data integration issues to be viewed as object mapping. Dynamic programming techniques allow rapid change exactly where it is needed, in supporting the somewhat fluid IODB in support of changing supporting data needs for various client-server and intranet projects. Because the IODB is designed to support specific projects rather than to duplicate everything that is contained in all legacy databases, it is clear that as data needs grow, the need to expand the definition of the IODB is a natural consequence. This also goes hand in hand with new data mapping and modeling needs. Dynamic OOP, therefore, is a powerful approach to building a system whose exact definition is best viewed as dynamic. In this case the definition of the IODB is rather fluid, but the applications are expected to continue to work as the IODB changes. This normally disastrous scenario is rendered manageable by using dynamic OOP. Reflection The legacy database scenario depicted here consists of a variety of databases, operating systems, and machine types. At a data-mapping level, the data42


bases all implement immutable class systems whose objects are implemented via the hardware and software configurations that host them. Complexity can be minimized, and robustness improved, by using a reflective programming approach in abstracting out the object system implementation details from the data definitions. Reflection permits uniform semantic and syntactic access to objects which are conceptually the same but actually have very different implementations. For example, objects may come from different databases. The uniform syntax is achieved by implementing the class descriptions in an object-oriented way. So say an object is implemented in Sybase as a row of some table (where the columns are the object’s attributes). Now a core method, for example: look up a value in an attribute of an object, can dispatch on the class description of this object (go through the Sybase driver to fetch the value). Core methods will have the same syntax regardless of the underlying object implementation. The uniform syntax achievable through reflection results in a dramatic simplification in code and a corresponding increase in readability and robustness. The use of reflective ideas for building interfaces to databases has been demonstrated by Paepcke [5], and the heterogeneous application of similar techniques to object systems in general is also well established [6]. Reflection, therefore, is a powerful tool in managing a heterogeneous set of object systems—in this case databases implemented on different systems. Domain-Specific Languages

A

KEY COMPONENT OF DYNAMIC OOP IS THE

ability to quickly build domain-specific languages in order to simplify programming the domain-level problem. There are two ways in which this approach manifests itself. The first is that the implementation language is extended in domain-specific ways, and the second is that a completely separate domain-specific language is constructed for the purposes of developing the target application. If the problem can be expressed in the natural language of the domain, a minimal amount of coding is demanded, and the solution can be kept simple. In this case, a domainspecific mapping language is implemented that allows the mapping of data to be described simply in terms of data objects without worrying about how the objects are implemented in a particular database solution. The separation of data modeling from data implementation that the use of a domain-specific language provides is a major aid in managing complexity. Since implementation and data logic are kept

separate, as the implementation details change over time, the mapping logic is much less frequently affected. History and Architecture Around 1993, a rudimentary data conversion capability had been built in C++ that imported legacy data from a Honeywell system and transformed the data on the fly to a Sybase database based on data mapping that had been determined for a particular new appli-

Data Pump

Legacy applications and databases

A

Data Converter

Data mappings

C Browser

mapping environment from the legacy system (A), the new application system (B), and the domain-specific language implementation (C). Built around these reflective layers are the five interacting software components shown in the following paragraphs. Data Mappings The data mappings are declarative descriptions written in a simple domain-specific language that describe how data is mapped from the legacy systems to the target systems. Because the legacy systems and the target systems are described in abstract terms behind the reflective layer, the data elements are mapped in logical terms allowing details of their underlying implementations New to change without breaking applications B and supporting the validity of the data mapdata repository ping logic. This has the added benefit of keeping the mapping language very simple so that data analysts can express data mappings in logical terms without worrying about programming details.

Encyclopedia

Figure 1. Architecture

cation project. The C++ program would be run from time to time, and would dump data from the legacy system into the Sybase database. The system worked well, but was rigid. Because the data mapping was performed by a hand-coded C++ program, it wasn’t easy to change the way data was mapped, or to change what data was mapped. It was clear that a more flexible data-mapping capability was required to be successful in supporting the growing interest in client-server projects. Furthermore, legacy data was also needed from IBM mainframes and perhaps other sources too. It was clear that the data-mapping exercise was one that would be subject to continual change, and that it was important for data analysis personnel without C++ expertise be able to specify data mappings from the legacy environment to the application repository database. The architecture illustrated in Figure 1 was designed to meet these needs. The architecture employs three reflective layers that protect the data-

Data Converter The data converter is a twostage piece. The data mappings are compiled by the converter into an optimized data movement engine that can be run periodically to cause the new applications data repository to be updated to reflect changes made in the legacy environment. Data Pump The movement of bits between the legacy system and the repository is mediated by a cross-platform data pump driven by a scheduler that allows updates at predetermined times or intervals. Time stamps are used to select updates from the legacy system and to incrementally feed the data back to the repository staging area. Browser The browser shows the relationship between data elements and represents the mappings between legacy data and target data. The data elements are themselves represented as abstract objects. The browser not only supports viewing of the data mappings, but also the editing of existing mappings and the creation of new maps. The browser is a window into the virCOMMUNICATIONS OF THE ACM May 1997/Vol. 40, No. 5

43

tual representation of the data that allows the data to be viewed as objects rather than as arbitrary database implementations.

cal components. An input reflective layer, an output reflective layer, and the generated data converter Pick Pick program. The generData DB2 DB2 Input ated data converter Output Converter Foxpro Foxpro Encyclopedia program is the Sybase Sybase … … The objects edited by translation of the the browser are stored mapping objects; in a Lotus Notes datathe input reflective Temp Index base allowing the data layer implements definitions to be easily the reading of the Figure 2. Data flow reviewed by developers input database; and of reports and developthe output reflective ers of client-server or intranet applications. The Lotus layer implements the writing of the target database. Notes database provides a forum for discussing the In principle, therefore, the data converter can convert addition of new data elements required by changing from any database to any other database as long as the application goals. necessary input and output reflective layers have been implemented. The Domain-Specific Language The generator is run just once, in order to create the The data converter language is designed to allow a data converter. Subsequently the generated data connon-programmer to move data from one database verter is run whenever a data conversion is required. implementation to another even when the mapping Whenever the mapping needs to be modified, the is quite complex. The data converter language is mapping objects can be edited using the browser and database neutral, and can the generator can be run in principle be applied to a again to create a new data multitude of databases. converter program. Logical Database Initially, a few specific While space prevents a databases are supported by detailed description of the the reflective layers, but data converter language DB2 Tables Pick Tables others are easily added. and related tools, the interested reader is referred Sybase Tables Design Principles to http://www.doll.com/ One goal of the data condatacvtr.html for a detailed verter has been that the description. Figure 3. Logical database language be easy to read, The data converter laneasy to write, and that the guage is an object-orirules can be managed within a simple framework. ented, declarative, rule-based language. That means that The language is declarative, not procedural, so this the mapping is described in terms of database objects approach comes very naturally. Relationships (tables, columns, and database implementations), that between legacy data and mapped data is described in the mapping is described as a sequence of declarations terms of declarative relationships between the about the relationship between data objects, and that objects, and the generator compiles these relation- these declarations take the form of simple discrete rules. ships into code that iterates through the source data, In general, an object can be written and modified in and constructs the target tables. isolation, and will usually be simple. There are two executable programs involved in the A database is considered to be a logical entity by the data conversion process. The first program is the data translator language. It serves the role of linking converter generator shown at the top of Figure 2. The together database implementations of the same data on generator takes mapping rules defined in the data con- different platforms. Each database implementation verter language and generates a program that is subse- consists of a collection of tables, and each table consists quently translated into the data converter program for of a collection of columns. The logical database in Figthe specific mapping defined by the input rule set. ure 3 includes tables from three physical databases: The data converter program consists of three logi- Pick, DB/2, and Sybase. 44


Generator

Logical Database: Pharmacy

Table: Medical Records

Table: Member

Column: Number

Table: Subscriber

Column: Address

Mapping consists of defining links between the columns of one database implementation and another. In Figure 4, we see a portion of the mapping for the logical database Pharmacy. This database is made up of tables, including Medical Records, Member, Subscriber, and Group tables, among others. In Figure 4, the Subscriber table is further broken down into columns (otherwise called fields): Number, Address, Zip, and Phones. In writing a mapping, a set of objects are defined that describe: • The logical database—with a database object. • The tables of each database implementation—with table objects. • The columns of each table—with column objects. The mappings between columns—with mapping objects (see Figure 5).

Figure 4. Tables and columns

Table: Group

Column: Zip

Column: Phones

The Data-Conversion Model Data conversion is accomplished by passing a source table. Passing a table involves sequentially accessing and processing each record of the source table. During the processing of each record, a number of things can happen: • The record can be ignored. • The record can be treated as a collection of parts. • Other records can be looked up. • Flags can be set to make interesting records. • A record in the destination can be created and populated. Records can be ignored because the record doesn’t fit some selection criteria, such as not falling within the date range, or not having a mandatory field. In general, one can specify an arbitrary expres-

Column

Table

Table

Table

Table

Table

Table

Mapping

Figure 5. Mappings Column

Column

Column

Mapping

Column

Column

Column

Mapping

Mapping

COMMUNICATIONS OF THE ACM May 1997/Vol. 40, No. 5

45

sion that defines when a record should be used. Conclusions Modern software development has taken a multitude of directions that simultaneously brings unprecedented access to corporate data and frustrates efforts to control software costs. The monolithic software development paradigms of the early years of the industry have yielded to diversity that grows at an increasing rate. Legacy software systems evolve at a sometimes painfully slow pace against an established bureaucracy, modern software systems evolve at a pace that challenges corporate efforts to introduce new products. One example of such rapid change is the swift replacement of client-server applications with intranet-based solutions. This change is occurring more swiftly than the rate of production of typical corporate client-server applications, making such typical applications obsolete before they are completed. The rapid growth of new object systems in the form of databases, languages, operating systems, and integration components (such as client-server tools and Web browsers) poses new challenges for corporate application development. The details of the described application, as in any real-world problem, are significant, and space only permits an overview of the solution. Nevertheless, certain significant issues clearly stand out: • Building domain-specific languages dramatically reduces the complexity involved in describing the problem solution. Languages that facilitate this form of solution are therefore highly beneficial. • Systems that are expected to be dynamic in their specifications and designs, as so many commercial applications are, deserve to be implemented in a manner that matches these dynamic considerations. Dynamic OOP is a compelling approach that aids development time, ability to change over time, and application robustness. • Diversity is a fact of life in commercial systems. Solutions from a multitude of vendors must be integrated. Integration of heterogeneous systems can be dramatically facilitated by a reflective approach. Reflection allows implementation of the data objects to be separated from the uses of the data. Another benefit of the reflective approach is that programs can be written that manipulate the meta representations in order to decide how to move and cache data most efficiently. In an environment in which new technologies and solution paradigms are being developed at great speed 46


while legacy systems are evolving slowly, a reflective programming approach not only provides a clean factorization for object system diversity, but offers a way of impedance-matching parts of cooperative software systems that are moving at very different speeds. Another advantage of this reflective approach is that it allows binding of data elements to specific object systems to be changed over time without breaking the systems that depend upon them because the reflective layer hides the implementation details of the abstract objects. Experience with the project described in this article provides striking evidence of the effectiveness of using dynamic OOP for building key pieces of rapidly changing information-processing infrastructure. The system described was implemented in Yolambda [1] a scheme-compatible language [2] with object-oriented extensions that support dynamic objects and reflective capabilities similar to those of CLOS [3]. Other languages supporting meta objects and dynamic objects like Common Lisp might have been used with equal success. The system was developed in a very short period of time and put into active service. The trend toward greater access to data has been established and the pace of new developments in client-server and intranet technologies continues to grow, while legacy data issues continue to coexist. Technologies like dynamic OOP and reflective programming principles have a growing place in pulling the pieces together. c References 1. DOLL. Yolambda Reference Manual. Dynamic Object Language Labs, Inc., 1996. 2. IEEE Standard for the Scheme Programming Language. IEEE Standard 1178-1990, IEEE Piscataway, 1991. 3. Keene, S.E. Object-oriented programming in Common Lisp: A programmer’s guide to CLOS. Addison-Wesley, 1989. 4. Kiczales, G., and des Rivieres, J. The Art of the Metaobject Protocol. MIT Press, 1993. 5. Paepcke, A. Pclos: Stress testing clos experiencing the metaobject protocol. In Proceedings of ECOOP/OOPSLA ‘90 (1990), ACM, New York. 6. Robertson, P. On reflection and refraction. In Akimori Yonezawa and Brian C.Smith, Eds., Reflection and Meta-Level Architecture, Proceedings of the 1992 International Workshop on New Models for Software Architecture. ACM, 1992. Tokyo. 7. Smith, C.B. Reflection and semantics in Lisp. In Proceedings of the 11th Annual ACM Symposium on Principles of Programming Languages, (Salt Lake City, Utah, Jan. 1984), pp. 23–35. 8. Steele, G. Common Lisp: The Language. Digital Press, 1984.

Paul Robertson ([email protected]) is Chief Technical Officer at Dynamic Object Language Laboratories, in Andover, Mass. Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific permission and/or a fee. © ACM 0002-0782/97/0500 $3.50