Process Support for Incremental Component-Based Software Engineering of a Legacy System Gail E. Kaiser George T. Heineman∗ Peter D. Skopp† Jack J. Yang Columbia University Department of Computer Science 1214 Amsterdam Avenue, Mail Code 0401 New York, NY 10027 212-939-7081/fax:212-939-7084
[email protected] CUCS-007-96 May 27, 1998
c
1998 Gail E. Kaiser, George T. Heineman, Peter D. Skopp and Jack J. Yang
∗ Now at Worcester Polytechnic Institute, Department of Computer Science, 100 Institute Road, Worcester, MA 01609. † Now at Juno Online Services, L.P., 120 West 45th Street, New York, NY 10036.
1
Abstract Componentization is an important, emerging approach to software engineering, whereby new systems are constructed from relatively large-scale components intended to be used in a variety of systems. More significantly from the perspective of this paper, component-based software engineering (CBSE) also provides a road to modernization of stovepipe systems, which are restructured into components to ease continued maintenance. Selected components in the original system can then be completely replaced, e.g., the database or event controller, potentially in a family of configurations each including different realizations of the components. Of course, the newly separated components can also be reused in other systems. We have investigated software processes for re-engineering of legacy subsystems into components and re-structuring of the system as a whole to enable replacement of those components. While these processes could have been followed “by hand”, in principle, we were particularly concerned with how process and workflow management technology could assist and enhance CBSE. This paper describes our experience componentizing a medium-scale legacy system following two generations of CBSE processes supported by two generations of process-centered environments. Keywords: Cross-Referencing, Object-Oriented Database, Process-Centered Environment, Process Evaluation, Process Evolution, Software Process, System Build, Tool Enveloping, Workflow Management.
1
Introduction
Componentization is an emerging approach to software engineering, whereby systems are constructed from relatively large-scale components intended to be used in a great variety of systems; this is distinct from previous trends both in reuse granularity, i.e., as opposed to composition of systems from small-scale modules, and in reuse frequency, i.e., where only mathematical subroutine libraries and a few other special cases have had much impact outside the development organization. The key paradigm shift seems to have been standardization of component frameworks, such as CORBA, ActiveX/DCOM, and JavaBeans [10, 12]. However, this paper is concerned with a less touted but potentially more significant application of component-based software engineering (CBSE), as a road to modernization of legacy systems. In particular, we are concerned with re-engineering selected subsystems of the legacy system, so that they are sufficiently disentangled from the rest of the system that they could, in principle, be reused themselves in other systems; and, in tandem, re-structuring the remainder of the system to sufficiently abstract the interfaces of those subsystems so that they can be replaced by other components. One application might be to upgrade portions of the stovepipe system to new technology, e.g., a new communication protocol such as HTTP or a new user interface such as a Java applet. A more ambitious goal might be to migrate to a new architecture, for instance, converting a monolithic system to the client/server paradigm, with some old components appropriately encapsulated to operate within the new structure. Of course, the newly separated components might actually be reused in other systems. But even if components from the legacy system are neither reused elsewhere nor replaced, the resulting system should be more easily maintainable than previously since changes inside one subsystem should not impact other subsystems. CBSE is substantially different from the prevalent “build from scratch” and complementary “fix/enhance/port the previous version” views of software engineering embodied in the mainstream process meta-models, that is, waterfall and the various incremental approaches: evolutionary, spiral, concurrent engineering, etc. While any of these meta-models might be (and presumably have been) adapted to the construction of new systems from components, there has been little experience reported in the literature regarding the characteristics of software processes that are, or should be, followed to apply CBSE to legacy systems. We have investigated software processes for componentization of legacy systems. While these processes could have been enacted “by hand”, we focused on how process technology, particularly process-centered environments (PCEs) and the workflow enforcement and automation they implement, could assist and enhance CBSE. This paper describes our experience componentizing a medium-scale legacy system following two generations of CBSE processes supported by two generations of PCEs. The target system was a prototype research vehicle originating in our lab in January 1987 and finally laid to rest in the summer of 1997, at that point consisting of approximately 300,000 lines of C code. Over 60 graduate and undergraduate students participated in the effort, most of them for only one semester as part of an independent study project for 1
academic credit, a handful for several years as paid research assistants. An “exploratory programming” style was typical, and almost all the “design documents” are in the form of research publications and theses; the resulting system certainly qualifies as stovepipe. Because the target system was the main research product of our lab, and elaboration and realization of new research ideas were constant activities, the componentization effort from spring 1993 through spring 1997 was necessarily performed in an incremental manner at the same time as and interleaved with the “new” development work. It was simply impossible to suspend other changes to the system while componentization was in progress, to devote full attention to componentization (which is part of the reason it took so long, another reason being that the componentization was itself new research at the time), or even to fully separate the re-structuring/re-engineering process(es) from the exploratory development process operating simultaneously on the same code, literally, i.e., not another “version” of the code base. And since we were actually using the target system ourselves on a daily basis, we couldn’t afford to “break” the code for any non-trivial portion of time (more than a week or so). Our first CBSE process, OzMarvel, was devised in spring 1993 and supported by a PCE called Marvel. EmeraldCity came on-line in spring 1995 supported by the successor PCE called Oz [3, 9]. Prior to OzMarvel we used a non-CBSE Marvel process called CMarvel, starting in January 1992, which mimicked our earlier work methods on bare Unix. There were two main reasons for upgrading from OzMarvel to EmeraldCity. One was to bootstrap from Marvel to Oz as our platform to continue development of the target system. The Oz project is devoted in large part to componentization issues, while the predecessor Marvel project was not. Another important distinction between Marvel and Oz, for the purposes of this paper, is that a Marvel environment instance supports a single process that must be enacted by all users of that environment, although they would generally follow distinct workflows within that process, whereas an Oz environment instance supports interoperability among multiple processes and interactions among users carrying out workflows within different processes. A Marvel environment with an in-progress process, such as OzMarvel can be converted to a single-process Oz environment, but as explained later EmeraldCity directly exploited Oz’s multi-process support. More significantly, we needed to change the process independent of the supporting process technology: EmeraldCity is substantially different from OzMarvel in several dimensions, more directly concerned with CBSE, due to our mid-stream evaluation following about two years of experience using OzMarvel to divide the target legacy system into components and integrate experimental systems from those and external components. Thus the second reason was to incorporate what we had learned from our initial, in retrospect relatively naive attempt at a CBSE process and continue our long-term componentization effort with the significantly better process (i.e., from the viewpoint of successfully supporting CBSE). It is worth noting that the target system is Marvel and Oz; Oz was developed by extensive modifications to Marvel, and the software process community is well-aware that they are different generations of what is in essence the same system. Thus the process technology we describe was used to support componentization of the system implementing that same 2
technology. However, there is nothing specific to PCEs as the target of either of our two CBSE processes or the process support technologies, so the approach should apply equally well to other medium-sized legacy systems. A PCE just happened to be the system we were componentizing and from which our experience is drawn; this was our real work, not an invented “case study”. Certain peculiarities of the CBSE processes are, however, specific to C programming, e.g., the distinction between source and header files, the use of prototypes and include files, and so on; we assume throughout the paper that the reader is generally familiar with ANSI C. First we provide brief background on the Marvel and Oz process modeling languages (for writing down the processes) and workflow management engines (for guiding participants through enactment of the processes, enforcing process constraints and automating workflows). Then we describe the OzMarvel and EmeraldCity CBSE processes, including the requirements they were intended to fulfill, how they exploited the then-available process/workflow technology, and our experience using each of the instantiated PCE instances in our componentization efforts. The paper concludes by summarizing lessons learned. We do not present in this paper any details regarding support for evolution of the in-progress process through changes within or between our two processes or process technologies, but remark briefly on its occurrence.
2
Marvel and Oz Background
Marvel’s process modeling language defines a process as a set of tasks. Each task consists of three major parts: 1. A logical condition that should be true before the activity is initiated; 2. An activity that specifies what to do, often but not necessarily involving one or more external applications such as an editor, static analyzer, virtual whiteboard, desktop video conferencing, etc. An activity could also involve solely “low-tech” human operations such as holding an in-person meeting; 3. A set of effects, one of which is asserted after the activity is completed. The effects generally map to success and failure cases, but there may be more than two distinct effects to describe a variety of anticipated results. Each task also has a name, a set of formal parameters — which are mapped to actual parameter data when the task is enacted, and a binding clause for obtaining implicit argument data via queries on the environment’s data repository and associating them with local variables. Task names may be overloaded, that is, multiple tasks with different parameter signatures may have the same name, such as edit. The activities themselves are defined by scripts, not in the process modeling notation. Most data is typed, by classes instantiated in the environment repository, although activities may take some data as literal strings. Thus every process model is accompanied by a data schema defining the composition of folders (contents 3
and attributes) and relationships among folders (containment and references analogous to shortcut links). Oz’s process modeling language is nearly the same as Marvel’s. One major syntactic difference is that Oz includes means to optionally specify who is to do a particular task; this might be a particular userid, any user who fulfills a certain role, any member of a given group, or multiple members of a given group together for team tasks. Omitting the user information implies that any user can perform the task. Marvel had permitted the desired user or group to be specified in a task’s condition, but didn’t treat this information in any special way. Oz also permits the process designer to extend the process enactment engine by introducing new syntax into the process definition and realizing the semantics of that syntax through plugin code. And the semantics of built-in syntax can also be changed, to a degree, by plugin code. Both Marvel and Oz operate as client/server systems. The client is relatively thin, and supports only the graphical user interface and an interface for starting/terminating activities, e.g., by launching scripts that in turn startup and interact with applications on the user’s desktop and/or tell the human(s) what to do and solicit confirmation that they have done so. The server performs all process/workflow enactment for multiple users, manages both process and product data in its object management system, and provides concurrency control and failure recovery. All user clients connected to the same server participate in the same process and their users are generally members of the same small team, usually consisting of less than ten people. In Oz’s case, multiple servers (or sites) may be allied, meaning they can interact with each other to support data sharing and cross-process workflows with respect to several closely cooperating teams. Allied sites may reside within the same local area network or be dispersed across the Internet or an organizational intranet. Marvel’s workflow engine enforces process constraints, in the sense that it will not permit the user to perform an activity if the corresponding condition isn’t satisfied. Since all product data, even files from the file system, are accessed through Marvel, it can effectively control this aspect of the process (although, as in most systems that appear to enforce rigidity on human behavior but need to be used under deadline and other pressures suggesting relaxation of the rules, there are simple but undocumented workarounds). Marvel’s workflow engine also automates certain facets of process enactment. In particular, when a user requests to enact a task but its conditions are not currently satisfied, Marvel will find, instantiate with the relevant parameters, and attempt to execute any other tasks whose effects may result in satisfying or partially satisfying the condition. When the effect of a completed task is asserted, Marvel will find, instantiate with the appropriate parameters, and attempt to execute any other tasks now enabled, i.e., its condition has become true with respect to those parameters. Both automation mechanisms operate recursively and exhaustively, trying every possible combination of tasks that might fulfill the desired condition (note that fulfillment isn’t guaranteed just because one of a tasks effects satisfies the condition, because the result may be another one of the effects), and continuing to trigger additional tasks until no new conditions become satisfied. The search and instantiation is reasonably efficient, because possible 4
“chains” among tasks are precompiled when a given process model is loaded into the workflow engine. This automation operates by default, but can be “turned off” wholesale or with respect to particular predicates in conditions and particular assertions in effects. Usually only a small number of tasks are automatically invoked, after a user selects and parameterizes a task representing an entry point into a composite task consisting of one or a few main primitive tasks and a small number of other auxiliary tasks (not listed in the user menu but reached only via automation) to propagate changes and perform bookkeeping chores (e.g., the activity may be null, with then of course only one effect, or a batch utility invoked in the background by the activity). But it is possible to define a large workflow as a single goal-driven or event-driven chain of tasks — which may be useful for simulation or training purposes. Data operations such as add an object, delete an object, introduce or remove a hypertext-like link between two objects, etc., are modeled as primitive tasks for a uniform approach, and different conditions and effects can be attached to such operations for different classes of objects (e.g., source code files versus design document sections would be instances of different classes). Oz can do everything Marvel can do. Marvel-like enforcement of process constraints is the default, but the plugins might redefine the condition as “advisory”, in all cases or only under specified circumstances, with users notified of unsatisfied predicates but permitted to proceed to the activity, perhaps with logging of the exception and notification of appropriate supervisors. Marvel-like automation of sequencing among process tasks is not the default: automation must be explicitly “turned on” for specified predicates rather than the reverse, since we had found during our experience using Marvel that most predicates were marked as no chaining in our real-world process models. Oz’s open-ended extensibility enables backward and forward chaining to be pruned or augmented, performed in simulation (no side-effects) mode, elaborated breadth-first or depth-first, in parallel or serially, and so on. But the most significant innovation Oz brings to process technology is to support alliances among two or more distinct sites, each with its own process served by a distinct instance of the workflow engine. These servers may dynamically form Treaties, which involves exchange of process modules (related sets of tasks) and mutual agreement to cooperate in performing these tasks. These process modules are automatically integrated with the server’s own autonomously devised local process. When one of these tasks is requested with parameter data from two or more of the participating servers, this is called a Summit. First the conditions are checked and perhaps satisfied via process automation by each site on its own data following its own local process, then the activity is performed on the data, and finally the proper effect is asserted and perhaps any implications fulfilled through process automation again by each server on its own data according to its own process. Treaty tasks may also form chains as part of the same Summit. The activity of a Treaty task might be undertaken entirely within the purview of a single coordinating server, or involve users and/or applications from multiple sites. Treaties and Summits mangle the International Alliance metaphor a bit, since Summits normally precede Treaties rather than vice versa, but the idea is that each site (country) continues to follow its own local process (laws and customs) to perform its obligations (prerequisites and implications of a Summit) under the multi-site Treaty. Treaties between Oz 5
sites are set up on a pairwise basis that is neither symmetric nor transitive, so the connection graph need not be complete, although Summits can involve any number of sites that have agreed to the same Treaty. While neither Marvel nor Oz are production-quality in the commercial sense, we used the technology on a daily basis for over five years and licensed the systems to about 50 institutions. Further details about Marvel can be found in [7, 1]; more on Oz is in [8, 4]. Note that this paper refers only to the final versions of Marvel, 3.1.1, and Oz, 1.3, for simplicity and earlier released versions — most of which were used for the described processes along the way — do not provide all the features discussed here.
3
Requirements
The following requirements are for the process itself, independent of whether it is enacted via process/workflow technology or solely by human labor. Some of these requirements are generic with respect to componentization of many medium-sized legacy systems: • The process should be based on notions of subsystems that are specific to the given legacy system and of components that are reasonably independent of any particular target system. We refer to the latter as context-free, even though it is expected that components will assume some API or other facilities on the part of any context in which it may be used, and these cannot be expected to be standardized in the general case. • The process should specify steps for identifying existing subsystems according to some chosen criteria, e.g., cohesive major functionality; for evaluating their potential for separation from the rest of the system — both in terms of the intricacies of their dependencies on the rest of the system and the intricacies of the dependencies of other parts of the system on them; and for systematically removing dependencies when possible and for reducing the remaining dependencies to syntactically and semantically clean interfaces. We say “intricacies” here rather than, say, “coupling”, because the latter implies an objective quantified metric while what we have in mind may sometimes be more subjective and qualitative. • The process should support a small closely cooperating team, allowing for multiple software engineers to perform related activities at the same time, and include steps for checking and restoring consistency among their changes at reasonably frequent checkpoints, and for ensuring that participants are promptly notified of work undertaken by their teammates that potentially affects their work. Three additional requirements concerned with the incremental nature of the desired CBSE process(es) were peculiar to the needs of our project but are probably not all that unusual: • The process should support integration of the components into multiple distinct systems at the same time. This is problematic when the components themselves import 6
facilities from their context, which would be different in each case. For C source code, this means at minimum that different compilations must be able to find the imported include files in different places and produce different binaries for each target system into which they are incorporated. • The process should also encompass “new” development of the system, both inside and outside the components (by outside here we mean the portion of the system that was deemed not separable into a component or that serves as “glue” among components), consistent with componentization activities, without requiring distinct “versions” of the whole system but while still permitting participants in the “new” work to operate relatively independently of the CBSE team. However, there may be simultaneously multiple versions of the same subsystem/component within the same project repository, in varying stages of development (this is also an implication of the previous requirement). • The process should include steps leading to frequent system builds, integration testing, system testing and deployment of the partially componentized system, so that the new features can quickly be brought into use. We had no particular a priori requirements in mind for the process/workflow technology, but instead intended to best exploit the available technology to accomplish our componentization goals.
4
First Try: OzMarvel
OzMarvel was our first attempt at a CBSE process, written in Marvel’s process modeling notation and running as a Marvel environment instance. We developed it to assist us in pulling subsystems out of our target system to rewrite them into components, and at the same time introduce substantial new functionality into the target system. Three specific components were initially envisioned; another was added later while using EmeraldCity. Together these encompassed most but not all of the major externally visible functionality of the target system. We also planned to use OzMarvel to help us perform a set of experiments concerned with integrating our components into externally developed systems and replacing portions of the target system with externally developed components. These experiments were intended to provide feedback into the functionality and interfaces of the components, as well as the structuring of the target system to enable component replacement, and are discussed in depth elsewhere [5, 13, 6, 11]. The OzMarvel data schema structures the environment’s data repository into two main folders. One folder represents a set of teams, each consisting in turn of a group of private developer workspaces. We used only two teams, representing current and past lab members, respectively, but the schema allows for an arbitrary number; this particular aspect of the structure was useful primarily because the Marvel’s user interface navigates folders hierarchically, and a folder containing only active users is less cluttered than one also containing 7
all previous users. The same division would presumably have been useful in an entirely manual process enactment, for similar reasons, and generally the data schema could have been applied directly to the file system without involving a PCE. The data schema defines a workspace folder as containing a set of C source code and header (include) files checked out by that user, locally generated object code binaries and linked executables, and references to libraries in a shared area needed to compile and build local executables. There are also references used for testing executables in one workspace together with executables from one or more other workspaces as well as the shared area (e.g., one user might be working on a new client while another works on a new server that must be tested together due to a change in the client/server protocol). The other main folder in the repository contains a set of projects, each representing a shared area. We had three, corresponding to the current baseline version of the target system and its main components, work progressing independently of that system (e.g., the experimental integrations with imported systems), and a frozen copy of an older version of the target system as delivered to a funding agency. The first two projects are collectively referred to as the “Master Area”. Each project folder consists of a set of what the schema calls systems, a component pool, a module pool, and a pool of external libraries. Each system folder in turn contains a set of subsystems, each corresponding to a distinct executable (a distributed system may involve multiple cooperating executable programs). 1 For example, at the time we migrated the target system out of OzMarvel into EmeraldCity, it had 19 subsystems: three variants of the server, four kinds of client, three translators for different notations, the daemon for automatically bringing up the server when a client starts up, and several utilities for managing running configurations of the target system. Libraries represent object code archives (i.e., Unix “.a” files) that may be linked into subsystems or components, together with their header files needed for compilation of importing code. For instance, OzMarvel had external libraries for gdbm (used as the backend of the target system’s native object management system), for the PCTE object management system (which replaced gdbm in two experimental variants of the target system), and for the socks secure TCP/IP sockets package (which allowed the target system to operate through organizational firewalls), along with motif, xview, termcap, etc. libraries imported by particular target system clients. Each subsystem folder referenced the several context-free components and external libraries (in the component and library pools, respectively) from which it was constructed, and directly contained special-purpose modules for “glueing” those components and libraries together to construct the specific subsystem. The components in turn referenced the contextfree modules (in the module pool) from which they were composed, and also contained local “glue” files for tailoring its modules to provide the functionality needed for that component. Each module (which could be and often was decomposed into a hierarchy of (sub)module 1
Note these subsystems are not the same as the legacy subsystems referred to throughout this paper to be re-engineered into components; multiple components may be linked together in the same executable.
8
folders) contained its source files, object code archive, and public (to using modules) and private (for use only within the module) header files, as well as references to external (to the module) header files needed for compilation. We emphasize context-free here, meaning that the components, modules, etc. are not supposed to make any assumptions about the systems and subsystems in which they were to be used and, at least in principle, were amenable to “plug-n-play”. The main process tasks are concerned with checkout/checkin of files from/to the Master Area with respect to developer workspaces; editing source and header files; always running certain analysis tools on those files after every change; compilation and system build; testing; and notifying potentially affected users when changed files are checked in. Workflow automation was very useful for conducting the more tedious aspects of this process. For example, crossreferences between code folders were automatically generated by capturing the results of the standard etags and our own inverse revtags analysis tools after every code edit. Other process support was concerned with locating the appropriate header files for compilation and binaries for linking executables. Over its lifetime, OzMarvel was actively used by 14 people (not all at the same time). Although the basic philosophy and design remained the same, OzMarvel was modified several times to fix bugs in the process model and to improve multi-user support; see [2] for a brief discussion of the schema and process evolution utility, called Evolver, used by Marvel and later Oz to upgrade the state of an in-progress process to match the semantic constraints of a new process model. The final process evolution left OzMarvel with 139 tasks (only 26 task names appeared in the user menu, due in part to overloading and in part to the marking of 75 tasks as auxiliary, that is, for internal propagation purposes only); 48 classes of folder (13 of them virtual superclasses, such as VERSIONABLE, which would never be instantiated); and 37 activity scripts. One limitation of OzMarvel was that it was tuned to the desired end-product, in particular, composition of subsystems from context-free components and modules, yet it also supported the original legacy subsystems, which were definitely not context-free. They were heavily intertwined with the rest of the target system, for example, direct pointers into data structures maintained by one particular subsystem were strewn throughout the code, appearing in nearly all but the lowest-level subroutines. The schema did not provide any specific representation for such pre-componentized code, and in practice what we call subsystems here were represented as (large) modules. Nor did the process model directly address upstream activities such as how to locate modules that could/should be included in/excluded from re-engineered components, or include tasks to perform such re-engineering, other than the generic checkout/checkin, editing, system build and testing in private workspaces. Instead, the human users had to “know” when and how to do the requisite work under the guise of the generic tasks. OzMarvel’s multi-level folder structure also proved much too complicated, evidenced by the relatively large proportion of auxiliary tasks needed. Recall that auxiliary tasks are used for automation, and are not generally visible to the human participant, although similar process steps would presumably be useful to guide humans performing the same work manually. For example, OzMarvel’s tasks to automate maintenance of each source file folder’s set of 9
references to the folders representing each of the directly or transitively included header files were particularly intricate (and buggy). A header file might include other header files with arbitrary recursion depth, and auxiliary tasks were triggered whenever the source file or one of the header files was edited in a way that affected header file inclusion. References to the header files for imported interfaces were necessarily different for each component/subsystem context in which the source file was used, as were the libraries with which its object code was linked, but the multi-level structure meant these references had to be gathered from the hierarchy rather than found all in one place. Further, multiple modules performed the same function with intentionally the same interface, i.e., there were at least two each of three major modules in the module pool, corresponding to the original native modules from the legacy target system vs. the corresponding code for our new components. Thus the tools we had used for code cross-referencing in the earlier CMarvel environment, standard Unix etags (key identifier uses to their definitions) and our home-grown revtags (the reverse), which assume a flat name space, did not operate properly in OzMarvel. Renaming solved this problem, e.g., component-name subroutine-name, but made it difficult to plug-replace one component with another since code had to be edited (or preprocessed) for each subsystem context. This difficulty with the activity scripts would have been obvious from the start had we realized then that modules could not realistically be re-engineered into components “in place” while enhancements and integration experiments were in progress simultaneously. In summary, OzMarvel did not meet many of our generic CBSE requirements. This is not surprising since these requirements were formulated in the later stages of using OzMarvel, for development of EmeraldCity, when we had a much better understanding of the issues and challenges. OzMarvel did not even do a very good job of meeting our peculiar requirements, which were known a priori, in large part because we overestimated the amount of human time and effort available for componentization work and underestimated the human time and effort needed.
5
Second Try: EmeraldCity
We originally imagined that we would construct EmeraldCity by evolving OzMarvel initially as a single-site Oz process and then distributing portions of it among a few sites, but that proved far too complex. So we designed the new process from scratch, retaining only selected portions of OzMarvel’s data schema. Between launching of OzMarvel and completion of immigration into EmeraldCity (using a utility presented elsewhere [15]), our code base nearly doubled from 155k to 280k lines; this figure does not include any external libraries or systems, e.g., for X Windows or used in our integration experiments. After that we consolidated and replaced code, with relatively little growth in the target system. 2 EmeraldCity is a set of several processes that work together, following Oz’s International 2
Not counting development of OzWeb, an extension of Oz that operates on World Wide Web entities, discussed elsewhere [9]. Recall that our target system was itself Marvel/Oz.
10
Figure 1: Hierarchical Master Area Display
Figure 2: Horizontal Master Area and Workspace Display 11
Alliance metaphor, rather than a single process like OzMarvel. Each EmeraldCity process is associated with its own independent data repository, and is served by a separate environment instance, unlike OzMarvel, whose single server operated entirely within a single repository. Recall that a single process model defines many workflows, which may be enacted concurrently, but that Oz permits interoperation of multiple autonomously devised processes — although in EmeraldCity the processes were designed top-down to cooperate with each other, as in [14]. EmeraldCity consists of two shared sites and an arbitrary number of workspace sites, which are usually personal to a specific developer (although workspaces can be and have been shared). One of the shared sites corresponds to the “Master Area” folder in OzMarvel, whereas the other “Assembly Area” is used only during re-structuring, to enable replacement of a legacy subsystem in the target system with its corresponding component. Workspace sites are virtually identical to each other, although they need not be, with the main customization being whether or not they form Treaties with the Assembly Area. Workspaces participating in the CBSE effort cooperate with the Assembly Area whereas other workspaces are unaware of its existence. Again, similar structures could have been imposed on a file system, or a set of file systems on different hosts corresponding to Oz sites, without involving a PCE. The greatest number of co-existing workspaces was 16: this number varied as students joined and left the project, or cloned their workspace to perform relatively independent work in each one. Figure 1 shows the (not terribly readable) hierarchical view from the “Master Area” site (oz master), showing the environment’s local data repository graphically. Figure 2 shows a (somewhat more readable) horizontal view from that site, with an open connection to the pds site (Peter Skopp’s workspace). All EmeraldCity sites share the same data schema (this is not a requirement of Oz, although obviously sites participating in the same Treaty must have a common subschema for the data types manipulated by Treaty tasks). A project folder is composed of a set of systems, a single prototype header file 3 , a set of other header files, and a program pool. The prototype header file is automatically constructed by auxiliary process tasks that concatenate the contents of the header files associated with every library in the project, which are in turn constructed automatically by activity scripts as files are edited and compiled, and archived into a particular library. This workflow could be performed manually, of course, but it would be then be tedious and error-prone to keep the prototype file up to date while the code was undergoing frequent changes by several developers. EmeraldCity requires that the prototype header file is included by all other header files in the project. Sections of the prototype header file are guarded with preprocessor variables, so that only the relevant subset of the prototypes are used during compilation of a given source file and there are no naming conflicts. The other header files could contain only type definitions, no prototypes, and header files were not permitted to recursively include other header files. These process constraints are all enforced by Oz, so there is no need for a 3
A prototype is essentially a forward declaration of a C function signature, as it must be used in source files whose object code will link with that function’s code; prototypes are a required feature of ANSI C.
12
Figure 3: Zoom Into Systems Hierarchy in Master Area human to check them by hand or remember to invoke a tool to perform that checking. An EmeraldCity program is analogous to a context-free component in OzMarvel: a program folder contains a set of modules, a set of libraries, a set of (local) header files, and a set of references to header files in other programs. Each module folder contains its source files, and references the one archive holding its object code and the appropriate header files from its containing program; the modules assume the context of their program, but are intended to be reusable in any subsystem. An EmeraldCity system folder contains a set of subsystems, as illustrated in Figure 3. A subsystem folder consists of a set of context-sensitive components, the subsystem’s executable, an archive library for “glue” code between the components, and the source and object code for the main file. 4 Each context-sensitive component folder contains source files for “glue” code within that component, a reference to the single library representing the entire component, and a hierarchy of (sub)component folders. These components are configured for use in one specific subsystem — thus the designation context-sensitive. Each workspace site consists of a set of “local projects”, which organize checked out source and header file folders according to their subsystem contexts. The gist of the CBSE aspect of the EmeraldCity suite of processes is that code files slated for componentization work are checked out of the Master Area site into a workspace site (via a Summit) for changes, 4
Required by C convention for every executable.
13
Figure 4: Workspace Display and then checked into either the Assembly Area site or back into the Master Area (more Summits, i.e., interoperations among multiple processes). The Assembly Area process allows only completely converted code to be checked in. However, other code can still be checked into the Master Area, permitting unrelated development of portions of the target system and the integration experiments with foreign systems to proceed unhampered. This was very useful since not all of the developers were involved in the CBSE effort, but had other pressing work to do that we wished to disrupt as little as possible. Figure 4 shows the hierarchical view from the heineman site, which allows opening of only the Master Area and Assembly Area (called proj students in the implementation for obscure historical reasons). CBSE-oriented process tasks within workspace sites range from using a home-grown tabledriven tool that semi-automated the lexical aspects of interface changes by matching code patterns that should be replaced with calls to a specified application programming interface (API); to manually recoding individual subroutines to realize such an API; to complete module redesign and rewriting. Subsystem builds in each CBSE workspace look for nonlocal object code first in the Assembly Area and, only if not found there, in the Master Area; other workspaces import only from the Master Area. After re-structuring with respect to a particular component (or, potentially, set of components) is complete, the entire code base is copied from the Assembly Area to the Master Area, and the Assembly Area reinitialized for the next restructuring, if any; if none is on the horizon, the Treaties between the the CBSE workspaces and the Assembly Area are revoked.
14
For example, over the summer of 1995 we converted the target system from its native pointerbased object management system to using the OID-based object-oriented database component (OODB) previously re-engineered from the native subsystem. One native component had already been replaced using OzMarvel, and two other component conversions were to come. This was our most substantial re-structuring effort, affecting approximately 150,000 lines out of 280,000. The incremental aspect of the conversion, while other unrelated enhancements were in progress, would not have been possible without the binary compatibility of the old and new interfaces, due to C’s allowance of type casting between integers and pointers. And it would have been exceedingly difficult without Oz’s process support, including the multi-site interaction. The same basic approach was also used for other re-engineering work, not concerned with componentization, such as converting our previously Kernighan and Ritchie C code base to ANSI-standard. The work there proceeded more gradually rather than as a major flurry. Code checked out into a workspace is converted following process tasks involving the Gnu protoize tool, extensions to the activity scripts for our conventional development tools, and manual changes to header files. The Master Area enforces that only ANSI-compliant code can be checked in (i.e., the ANSI C compiler using the strictest options generates no error or warning messages), so unlike the OODB re-structuring above all users were forced to convert to ANSI C in tandem with any other work on a given file. This decision was embodied in the CBSE process, and the opposite could have been implemented instead. EmeraldCity was actively used by about 15 people (not all at the same time). The Master Area site consists of 78 tasks (26 distinct names visible in the task menu, coincidentally 26 auxiliary tasks for internal propagation), 27 classes (6 of them virtual), and 32 activity scripts. 21 of these tasks are exported via Treaties to workspaces for use in checkin/checkout, local build, etc. A typical workspace has 68 tasks (24 task names, 19 strictly auxiliary), the identical 27 classes as the Master Area, and (coincidentally) 27 scripts. The Assembly Area process is the same as for a Workspace, except for three special tasks used internally during re-structuring plus two tasks exported for the three-way Treaties with a CBSE workspace and the Master Area. One of the former three task definitions is shown in Figure 5 and one of the latter two in Figure 6, both in the appendix; the other task definitions are analogous. In summary, EmeraldCity restricts the contents of header files to avoid transitive dependencies, simplifies OzMarvel’s notion of pools, and distinguishes context-free from contextsensitive representations of components. Its data repository organization solved the naming difficulties that permeated OzMarvel. The files scanned by cross-referencing tools, etags and revtags, are always encapsulated in the appropriate context. Late in EmeraldCity’s lifespan we added a home-grown tool, called Hi-C, to generate HTML (HyperText Markup Language) from any etags code base, to enable EmeraldCity users to view code and follow automatically generated hypertext links from uses to their definitions using World Wide Web browsers. EmeraldCity fulfilled all except our potential component identification and evaluation requirements for CBSE processes, both generic and peculiar to our project, but we could have done better in some areas. For instance, all members of the lab’s mailing list were notified of all code checkins to the Master Area, whether or not their own work was actually affected; in 15
practice this meant everyone 5 was bombarded with messages that were consequently rarely read, defeating the purpose of automated notification.
6
Conclusion
Our re-engineering of components from legacy subsystems, re-structuring of the target system to reconstruct it from these components, and experiments integrating components with/from foreign systems necessitated our development and use of two generations of component-based software engineering processes enacted by two generations of process-centered environments. Our processes focused on the nitty-gritty but mandatory details of code understanding, code conversion and configuration management, and ignore upstream aspects of the lifecycle (which were performed off-line). Although some of the problems encountered in OzMarvel were due to peculiarities of C, we’d expect to run into analogous difficulties using most programming languages — given that few production languages of the late 1980s and early 1990s were designed with “plug-n-play” componentry in mind. Oz’s workflow automation augmented with support for process interoperability (Treaty and Summit) proved an immense boon to our effort, and we expect would apply similarly componentizations of other medium-sized legacy systems. The incremental nature of the CBSE processes, and the corresponding support from our process/workflow technology, were critical in being able to perform re-structuring and re-engineering without significantly interfering with “new” development work on the same code base.
Acknowledgements The preliminary version of OzMarvel was developed by Andrew Tong and Steve Popovich. Issy Ben-Shaul contributed to the design of EmeraldCity. Wenke Lee participated in the OODB component replacement. Marvel, Oz, and the latter’s components are freely available, but without support; information regarding downloading is available at http://www.psl.cs.columbia.edu/, or send email to
[email protected]. This paper is based on work sponsored in part by Defense Advanced Research Project Agency under ARPA Order B128 monitored by Air Force Rome Lab F30602-94-C-0197, in part by National Science Foundation CCR-9301092, and in part by New York State Science and Technology Foundation Center for Advanced Technology in High Performance Computing and Communications in Healthcare NYSSTF-CAT-95013. Heineman was supported in part by an AT&T Fellowship. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the US or NYS government, DARPA, Air Force, NSF, NYSSTF or AT&T. 5
The first author, the faculty member director of the lab, was excluded from this list by choice but all the students were stuck.
16
A 3-page abstract of this paper appeared under the title “On the Yellow Brick Road to Component-based Product Lines”, in the 10th International Software Process Workshop, Ventron France, June 1996.
References [1] Naser S. Barghouti. Supporting cooperation in the marvel process-centered SDE. In Herbert Weber, editor, 5th ACM SIGSOFT Symposium on Software Development Environments, pages 21–31, Tyson’s Corner VA, December 1992. Special issue of Software Engineering Notes, 17(5), December 1992. ftp://ftp.psl.cs.columbia.edu/pub/psl/sde92.ps.Z. [2] Israel Z. Ben-Shaul and Gail E. Kaiser. Process evolution in the Marvel environment. In Wilhelm Sch¨afer, editor, 8th International Software Process Workshop: State of the Practice in Process Technology, pages 104–106, Wadern, Germany, March 1993. Position paper. ftp://ftp.psl.cs.columbia.edu/pub/psl/ispw8.ps.Z. [3] Israel Z. Ben-Shaul and Gail E. Kaiser. A paradigm for decentralized process modeling and its realization in the Oz environment. In 16th International Conference on Software Engineering, pages 179–188, Sorrento, Italy, May 1994. IEEE Computer Society Press. ftp://ftp.psl.cs.columbia.edu/pub/psl/CUCS-024-93.ps.Z. [4] Israel Z. Ben-Shaul and Gail E. Kaiser. Federating process-centered environments: the Oz experience. Automated Software Engineering, 5(1):97–132, January 1998. ftp://ftp.psl.cs.columbia.edu/pub/psl/CUCS-006-97.ps.gz. [5] George T. Heineman and Gail E. Kaiser. An architecture for integrating concurrency control into environment frameworks. In 17th International Conference on Software Engineering, pages 305–313, Seattle WA, April 1995. ACM Press. ftp://ftp.psl.cs.columbia.edu/pub/psl/CUCS-021-94.ps.Z. [6] George T. Heineman and Gail E. Kaiser. The CORD approach to extensible concurrency control. In 13th International Conference on Data Engineering, pages 562–571, Birmingham, UK, April 1997. ftp://ftp.psl.cs.columbia.edu/pub/psl/CUCS-024-95.ps.gz. [7] George T. Heineman, Gail E. Kaiser, Naser S. Barghouti, and Israel Z. Ben-Shaul. Rule chaining in marvel: Dynamic binding of parameters. IEEE Expert, 7(6):26–32, December 1992. ftp://ftp.psl.cs.columbia.edu/pub/psl/expert92.ps.Z. [8] Gail E. Kaiser, Israel Z. Ben-Shaul, Steven S. Popovich, and Stephen E. Dossick. A metalinguistic approach to process enactment extensibility. In Wilhelm Sch¨afer, editor, 4th International Conference on the Software Process: Improvement and Practice, pages 90–101, Brighton, UK, December 1996. IEEE Computer Society Press. ftp://ftp.psl.cs.columbia.edu/pub/psl/CUCS-016-96.ps.gz. [9] Gail E. Kaiser, Stephen E. Dossick, Wenyu Jiang, Jack Jingshuang Yang, and Sonny Xi Ye. WWW-based collaboration environments with distributed tool services. World Wide Web Journal, 1:3–25, 1998. ftp://ftp.psl.cs.columbia.edu/pub/psl/CUCS-003-97.ps.gz. 17
[10] David Krieger and Richard M. Adler. The emergence of distributed component platforms. Computer, 31(3):43–53, March 1998. [11] Wenke Lee and Gail E. Kaiser. Interfacing Oz with the PCTE OMS. Technical Report CUCS-012-95, Columbia University, Department of Computer Science, June 1997. [12] Scott M. Lewandowski. Frameworks for component-based client/server computing. ACM Computing Surveys, 30(1):3–27, March 1998. [13] Steven S. Popovich and Gail E. Kaiser. Integrating an existing environment with a rulebased process server. Technical Report CUCS-004-95, Columbia University Department of Computer Science, August 1995. ftp://ftp.psl.cs.columbia.edu/pub/psl/CUCS-00495.ps.Z. [14] Izhar Shy, Richard Taylor, and Leon Osterweil. A metaphor and a conceptual framework for software development environments. In Fred Long, editor, Software Engineering Environments International Workshop on Environments, volume 467 of Lecture Notes in Computer Science, pages 77–97, Chinon, France, September 1989. Springer-Verlag. [15] Michael H. Sokolsky and Gail E. Kaiser. A framework for immigrating existing software into new software development environments. Software Engineering Journal, 6(6):435– 453, November 1991. ftp://ftp.psl.cs.columbia.edu/pub/psl/sejournal91.ps.Z.
18
# rule signature convert_CLASS [?c:COMPILABLE, ?cf:PROTOTYPE]: # bindings of local variables to results of objectbase queries (and (exists LOCAL_PROJECT ?lp suchthat no_chain (ancestor [?lp ?c])) # Use local version of prototype file (forall PROTOTYPE ?LPT suchthat no_chain (member [?lp.proto ?LPT])) # Local HFILEs (forall INC ?li suchthat no_chain (member [?lp.inc ?li])) # Installed Interface (from set_subsystem[] rule) (forall INC ?ii suchthat no_chain (member [?lp.interface ?ii]))) : # condition # If the C file has not yet been compiled, this rule can fire. # The compilation changes the status of the CFILE to Compiled on success. (and no_forward (?ii.recompile_mod = false) no_chain (?cf.Name = "CLASS_PTR") no_forward (?li.recompile_mod = false)) # activity CONVERSION_TOOLS converter ?c.contents ?c.compile_log ?c.object_code ?c.proto ?li.directory ?ii.directory ?LPT.contents "-DMOVING_CLASS -Wall" ?lp.sys_includes ?lp.compiler_directives ?cf.contents # success and failure effects (and (?c.compile_status = Compiled) no_chain (?c.object_time_stamp = CurrentTime)); (?c.compile_status = ErrorCompiled); Figure 5: OODB Replacement Process Task
19
# Build with master Main file (in SUBSYSTEM or SYSTEM.common_main) # rule signature build[?lp:LOCAL_PROJECT, ?p:PROJECT, ?mp:LOCAL_PROJECT]: # bindings of local variables to results of objectbase queries (and (exists SUBSYSTEM ?s suchthat (and no_chain (ancestor [?p ?s]) no_chain (?s.Name = ?lp.subsystem))) (exists BIN ?lb suchthat no_chain (member [?lp.bin ?lb])) (forall COMPILABLE ?C suchthat (or no_chain (member [?lp.files ?C]) no_chain (member [?mp.files ?C]))) # verify that there is no local Main file (forall CFILE ?lm suchthat (or (and no_chain (linkto [?lp.main ?lm]) no_chain (ancestor [?lp ?lm])) (and no_chain (linkto [?mp.main ?lm]) no_chain (ancestor [?mp ?lm])))) # get master main file (forall CFILE ?main suchthat (and no_chain (linkto [?s.main ?main]) no_chain (ancestor [?s ?main])))) : # condition (and no_chain (?lm.Name = "") # Not-Exists condition no_forward (?C.compile_status = Compiled) no_chain (?main.compile_status = Compiled)) # activity # Use main object code with the SUBSYSTEM.object COMBINE_TOOLS build_local ?lb.executable ?lp.build_log ?s.libraries ?s.build_order ?s.med_libraries ?s.object_code ?C.object_code # success and failure effects (?lb.build_status = Built); (?lb.build_status = NotBuilt); Figure 6: 3-Site Build Treaty Process Task
20