Commercial Realtime Software Needs Different ... - CiteSeerX

National Research Council Canada

Conseil national de recherches Canada

Institute for Information Technology

Institut de technologie de l’information

ERB-1025

Commercial Realtime Software Needs Different Configuration Management W.M. Gentleman, S.A. MacKay, D.A. Stewart, and M. Wein November 1989

NRC No. 30939

This report also appears in “Proceedings of the Second International Workshop on Software Configuration Management,” Princeton, NJ, October 1989. pp. 152–161.

Copyright 1989 by National Research Council of Canada

Copyright 1989 par Conseil national de recherches du Canada

Permission is granted to quote short excerpts and to reproduce figures and tables from this report, provided that the source of such material is fully acknowledged.

Il est permis de citer de courts extraits et de reproduire des figures ou tableaux du présent rapport, à condition d’en identifier clairement la source.

Additional copies are available free of charge from:

Des exemplaires supplémentaires peuvent être obtenus gratuitement à l’addresse suivante:

Publication Office Institute for Information Technology National Research Council of Canada Ottawa, Ontario, Canada K1A 0R6

Bureau des publications Institut de technologie de l’information Conseil national de recherches du Canada Ottawa (Ontario) Canada K1A 0R6 ii

Table of Contents Page Abstract/Résumé.................................................................................................................................. iv Introduction.......................................................................................................................................... 1 The Essence of DaSC Configuration Management....................................................................................... 2 DaSC Configuration Management in Harmony........................................................................................... 4 Support Tools for DaSC......................................................................................................................... 6 How DaSC Differs from Traditional Configuration Management.................................................................... 9 No Check-out................................................................................................................................. 9 Long Developer Assignments............................................................................................................ 9 No Configuration Description Language.............................................................................................. 9 File System Rather than Conventional Database ................................................................................... 9 Conditional Compilation Banned........................................................................................................ 9 Inclusion Files not Factored ............................................................................................................ 10 Evolution in a Single Sequence with no Branches................................................................................ 10 Progress Monitoring not in Configuration Management System ............................................................ 10 Conclusions....................................................................................................................................... 10 References.......................................................................................................................................... 11

iii

Abstract Arguments are presented as to why integrated, monolithic configuration management is not well suited to commercial realtime systems. An alternative approach to configuration management that over several years we have found to be effective and widely useable is described. This approach, Database and Selectors Cel (DaSC), separates treatment of versions that exist simultaneously from the evolution of those versions over time. Versions that exist simultaneously are represented by selectors from a common database. Evolution is represented by layers, as in the the film animators cel.

Résumé On explique pourquoi la gestion de configuration intégrée monolithique n’est pas bien adaptée aux systèmes commerciaux exploités en temps réel. Une approche de rechange en matière de gestion de configuration qui, sur une période de plusieurs années, s’est révélée efficace et d’application étendue est décrite. Cette approche, Database and Selectors Cel (DaSC), (cellulo de base de données et sélecteurs), traite séparément les versions qui existent simultanément et les versions modifiées avec le temps. Les versions qui existent simultanément sont représentées par des sélecteurs pris dans une base de données commune. Les versions modifiées sont représentées par des couches comme les cellulos des dessins animés.

iv

Commercial Realtime Software Needs Different Configuration Management W. M. Gentleman, S. A. MacKay, D. A. Stewart, and M.Wein Introduction

within and across products. The inexpensiveness and availability of commercial mass market tools can outweigh advantages of custom ones. Software reusability is a major issue, but software components are even more important — the distinction being that the former term is often used to include modifying software from an existing base for a new product, where the latter term implies using exactly the same module in several products, retrofitting where necessary if new applications force enhancement. Software components provide economy of scale because they do not need to be debugged again each time and indeed can significantly enhance reliability when a new product can rely on components with extensive exposure in the field. They also reduce how much maintainers and installers must learn.

Configuration management [1] is a term used to describe two activities in building complex systems. One is managing the development of the components from which a complex system is built, and integrating those components to form that system. Alternatively, the term is used to describe the selection of appropriate components to produce a particular instance from a family of related complex systems. Attention to issues of configuration management has been recognized to have cost and quality benefits for large and evolving software projects. Many individual tools and complete support environments have been built for configuration management; some have become widespread [2–6]. However, most of these existing support facilities have made assumptions that are inappropriate in the computing milieu common in commercial realtime software development, such as the development of energy management systems or robot control systems [7]. A completely different approach is needed. After describing the motivations for a different approach, we will describe Database and Selectors Cel (DaSC), the methodology used with Harmony.* The Harmony project [8, 9] at the National Research Council of Canada is aimed at improving all aspects of building embedded systems and includes the evolution of the portable multitasking multiprocessor Harmony operating system, which was developed early in the project and has been available commercially for several years.

Perhaps the main consequence of cost sensitivity, however, is the need to amortize costs by selling the product to multiple installations, indeed as many as possible, accommodating dissimilarities between installations. This inevitably means making the product available on many platforms, which in the business of commercial realtime software is a much deeper commitment than, say, in office automation. First, the range of plausible platforms is far wider than just a few Unix boxes or the generic PC clone — it includes assemblages of board level products on a common bus that may not have not been combined elsewhere. Second, the realtime or embedded system nature means that the software cannot be device independent in a traditional sense because its whole purpose is to control peripherals, sensors, and actuators, and while there will be a level of abstraction in the software at which that control is independent of details of the particular peripherals, sensors, and actuators, these devices often do not fit well into standard device independent abstractions such as stream I/O. Moreover, the devices themselves are not standardized so the application product cannot rely on standard operating system support but must take the responsibility of driving the devices actually present in any particular installation.

The computing milieu of commercial realtime software development differs from that of other computing in several aspects. This work is typically done by small companies, or by groups in large companies that are sufficiently isolated from the other company activities that they behave like small companies. The products typically are cost sensitive, often because they are sold in a competitive situation. One important consequence of small companies and cost sensitivity is that the development facilities are often limited to microcomputer-based commodity computers, either not networked or with only low bandwidth networking (timesharing systems or networks of powerful workstations being too expensive). Another consequence of cost sensitivity is the need to pursue economies of scale in the development process,

Beyond hardware portability, maximizing market opportunities requires that the product be customizable to different customers’ needs by providing optional features as well as choices for tradeoffs in performance and resource requirements. Such customization cannot imply extensive programming, and so products need to be configurable and open; configurable in that optional features can be left out to reduce cost and that choices between

* Mark reserved for the exclusive use of Her Majesty the Queen in right of Canada by Canadian Patents and Development Ltd. / Société canadienne des brevets et d'exploitation Ltée.

1

2

available implementations are inexpensive to support, and open in that features and implementations not provided with the standard product can be added with minor perturbation. The documentation for multiple version software must itself be multiple version because generic documentation which attempts to span an entire product family either is too vague to be useful for specific installations or drowns the reader in excessive irrelevant detail that may well not have tracked product evolution (as may be seen in third party automobile manuals). Another distinguishing aspect of commercial realtime software development is that a shared file system containing the project is often not viable. Beyond the possibility (quite real on personal computers) that one file server might not cope with the load, the most obvious source of this restriction is that developers are often geographically distributed, not just because groups may be permanently located at remote sites, but also in that developers may temporarily be remotely located (as when a product is being customized to an installation at that location). Especially in temporary situations, the cost and time taken to set up adequate networking may be unjustifiable. More significantly, however, commercial realtime software development often involves more than one company. The software developers often belong to a different company than does the software installer, much less the user. Products are often developed by consortia, rather than by single companies. There are serious questions of how the different parties maintain their proprietary rights, how each party restricts visibility into management aspects of its part of the project from its partners, and how each partner protects itself from default by another partner (possibly by accommodating a substitute for that portion of the project). Putting all the project information from all partners onto one file server is inconceivable. The deliverable product of commercial realtime software development often differs from those that are produced in other development environments. Products are typically intended to run in a dedicated computer in some embedded system, and not only is that dedicated computer often unsuitable to use as a development system, but often it is not available to developers. Hence the norm is a “two box world,” where cross development is done on a host system for execution on a possibly quite different target system. (Because developers on the same project may use different host systems, portability across hosts is as important as portability across targets.) Conversely, the resources of the development system, such as source code, are often not available when the target system is running in production, nor even when the product is being installed in a new installation. Support for debugging in

the field is thus a challenging issue. Even the form of the deliverable product introduces problems. Quite often, the product must be in ROM, and tools to support development for ROM are inadequate in many development systems — for example, it is not easy to place optional features in disjoint ROM chips in order to facilitate warehouse stocking of optional configurations. More difficult is the situation where part of the product is in ROM and part of the product is distributed as binary executable — and the binary must work with releases of the ROM older or newer than the one it was intended for. Timescales of commercial realtime software are the last aspect we will discuss where such development may differ from that of other software. The lifetime of a single installation can be very long, and of course the commercially viable lifetime of the product itself can be much longer. In this situation, evolution of the product is often much more important than the initial release, and supporting evolution is even more important than supporting conformance to the initial specification. The product can even outlive either the target or development system, and field upgrade, possibly by an independent installer, must be supported. Another timescale aspect is that because of the domain-specific expertise associated with individual programmers, sensible development projects to be assigned to programmers are more likely to take weeks rather than hours, so the locking mechanisms associated with the check-out/edit/check-in cycle of most configuration management systems impede productivity and actually encourage cheating instead of resolving problems of concurrent development. In contrast to the requirements and problems raised in the foregoing discussion, most existing configuration management systems [2–6] are based on a single development system (e.g. Unix or VMS), assume a shared file system (although perhaps accessed by a network of computers, not just a single machine) and are oriented around a single project. They support a single main thread of development (versions and variants are diversions at any instant, rather than evolving forever in lockstep) and are intended for a collocated team within a single organization. The Essence of DaSC Configuration Management In the Harmony project, therefore, we have implemented an approach to configuration management that is markedly different from traditional approaches. This scheme has been used since 1984 in the maintenance and evolution of the Harmony operating system itself, for the maintenance and evolution of various application programs intended to run under the Harmony operating system, and for the maintenance and evolution of various tools used to develop Harmony programs (these tools

3

must be portable across many operating systems from VMS to the Macintosh OS). The scheme has also been used for assorted unrelated programs written by people familiar with DaSC. Elements of DaSC had been used in configuration management of Thoth [10] and in other projects at the University of Waterloo as long ago as 1976. The essence of Database and Selectors Cel (DaSC) is to deal separately with the two principal problems: managing the many versions of a program that exist at any one time and managing the evolution of those versions over time. “Database and Selectors” describes how those versions are maintained; “Cel” describes how this machinery evolves. We will describe DaSC in the context of source code for a program, but we also use it for source code of libraries, for user manuals, for application notes, for release bulletins, for product specifications, for bug reports, for change request orders, and for all the other information conventionally kept in configuration management systems. A prime objective in managing the many versions of a program that may exist at any one time is to minimize redundancy, i.e., to maximize the commonality among representations of the different versions and to exploit it explicitly by representing the commonality only once. This saves storage, affording the same kind of benefit that deltas do when tracking program evolution. It also assists maintenance, in that enhancements and bug fixes in common code need only be made once, and indeed are immediately incorporated in all versions literally sharing the same code, and thus the likelihood of some versions getting out of date is significantly reduced. However, the most important reason for seeking commonality is conceptual: whereas an end user may well only be interested in the version applicable to him, maintainers and installers frequently have to consider all versions. This happens, for instance, when assessing the impact of some proposed change, or when exploiting existing versions as guides in defining a new version. The range of versions is more readily comprehended if what is in common need only be considered once, and what is different is well identified. This comprehension motivation for commonality of representation strongly implies that the granularity of commonality should be what is a meaningful module in the programming language: we use a single procedure or a single abstract data type as that granularity. (Conditional compilation has often been used by others to represent commonality, but the comprehensibility of code containing conditional compilation directives breaks down when too many versions are combined, or when the conditions become complicated because factors defining versions interact, or when large amounts of unconditional source code must be searched to locate the conditional parts.)

The foregoing point of view leads to thinking of a database of source code for a family of programs, each version defined by a (possibly ordered) set of selectors that identifies in the database the particular granules of code that make up that version. In addition to common code, there will be optional code that may be included in some versions and not others, and there will be code for which there are alternate implementations (called variants) and a version must identify which variant is to be chosen (sometimes more than one variant is needed). Often versions will differ in several orthogonal or at least independent aspects. The keys used to identify granules in the database are the characterization used to describe these aspects. The database together with the sets of selectors for all versions of the program represents the complete family of programs. Database structure is important for more than retrieval, as it forms the framework for guiding comprehension of the family, and in our experience the common use of hierarchical decomposition for understanding leads to databases structured as trees or near trees. Note that since portability across development hosts is one reason for multiple versions, it may not be sensible to maintain the database on a single machine because it may not be possible to compile all versions on any one machine. To support evolution in time, we first consider the kinds of modification likely as the product evolves. Change in the code granules is likely, addition of code granules is likely, modification of selector sets is likely, and introduction of selector sets for new versions is likely. Deletion of code granules is also possible, as is deletion of versions. More complex, although infrequent, is restructuring of the database. Such restructuring can be provoked by enhancements that do not fit within the existing structure, but for a mature product is more likely to result from recognition of a new understanding that increases commonality. Two conclusions can be reached from this consideration of kinds of modifications: for some changes, a complete description of the change in terms of additions and deletions is necessary for understanding, and for some changes, the database and selectors are so tightly coupled that an instantaneous change of view of the complete database and selector set is necessary for consistency. These kinds of modifications suggest the film animator’s cel as a way of thinking of and representing change. By overlaying a transparent medium on an existing image, the animator can paint on the medium to add to the image or to paint out parts of the image, giving a new composed image without violating the integrity of the original image. The painted layer of transparent medium concisely defines the changes made in the composed image. It provides an identity for a set of changes that is retained, thus before and after images are readily available.

4

Disjoint changes produced by separate animators can readily be verified by comparing their layers, without resorting to the original image, and can of course be applied in any order. The cel can be used to support temporary experimentation (like delayed posting in database systems). Obviously the original image might itself be composed of many such layers. Of course, animation cels also support reusability.

ible with the development system existed. Also Rochkind had pointed out with SCCS [5] that his initial attempts to use a normal database for configuration management ran into the barrier that programmers want to use their conventional tools (editors, compilers, linkers, portability checkers, etc.) on their programs, and that the existing versions of such programs work on files. He did not want to force programmers to give up these tools, nor could he reimplement all such tools, so he was To use this analogy for configuration management, what forced to deal with files. Our desire to be able to use is necessary is to have a way of representing layers in the commercial mass market tools underscores this point database and of representing layers of selectors. Selectors more strongly. can obviously point to the layer each granule is to come from. Evolution in time is thus accomplished by adding The tree structure of the file system corresponds to a natlayers, defining the layers that make up the new base. ural hierarchical structure for the database. At the top Our experience is that it is also desirable to reserve sepa- level, the database is partitioned into documentation, rate layers for derived objects, such as relocatable object tools, programs, subroutine libraries, etc. Each program files and executable images. Tools to support composed is then partitioned into src, the source code for that proviews as well as views of individual layers and layer gram, and inc, the sets of selectors for that program. comparison are desirable, in addition to which tools to perform consolidation (collapsing layers) are desirable The source code is partitioned into major abstractions, when that is appropriate. Distribution of a software re- these possibly partitioned into minor abstractions, etc. lease to existing users can be by shipping selected layers When there are variants, i.e., alternate implementations only. They can apply it to their current release after re- of an abstraction, a subdirectory is used to contain each solving any interactions with work they have done since alternative. If the implementation of a variant is suffithe common base release, which will be represented by ciently complex, the variant may itself be partitioned into abstractions. Variants may be nested if there are althe layers through which they see the common base. ternatives, usually arising from different factors, in how The use of the cel for evolution in time should be famil- a given variant may be implemented. In practice, the tree iar. Although not described this way, it is effectively depth often reaches the maximum permitted (VMS, a what has always been used by large computing centers to major development host, imposes a limit of eight). facilitate transition of locally developed enhancements Portability, with respect to either target or development onto the new releases of computer manufacturer software, system, is merely treated as one more cause for having which arrive every six months or so. The computer ven- variants. Each abstraction directory contains a text file dor when implementing enhancements, of course, never called abstract.doc that describes the abstraction repretook into account what its many customers had changed. sented; each variant directory contains a text file called Collisions were inevitable. Each computing center had to variant.doc that describes what is different about this be able to distinguish its own enhancements from the variant. Each source code file contains a single function, base supplied by the vendor, so that before installing a a set of related definitions or a data structure. Note that new release it could retrofit those of its own changes that source files never contain selector or include statements were still relevant. Third party software suppliers often because pathname syntax may be different on different deeffectively required additional layers. velopment systems. Redundancy is minimized by placing source files as high as possible in the tree, i.e., files DaSC Configuration Management in Harmony that are common to all alternatives are placed in the The above description of DaSC is generic and necessarily common enclosing directory. vague. To make it more concrete we will describe the The inc directory for a program contains one subdirectory database, selectors, and cels actually used in Harmony. for each supported configuration. The subdirectory is We have used a tree-structured file system as the called an inclusion directory, and its primary contents are database, and absolute pathnames as the individual selec- inclusion or selector files, i.e., files that contain only intors. The principal motivation for this is portability clude statements as in Fig. 1. across development systems. Almost any plausible development system has a tree-structured file system in- The structure of the inclusion directory side of the tree cluded in the base system price. Consequently, we can parallels that of the corresponding source tree, although count on the database being available, which we could it may be somewhat compressed. An inclusion directory not with a separately priced database even if one compat- for a particular configuration contains three types of

5

items: inclusion file(s), file(s) of directives for the linker or the library editor and a documentation text file called version.doc that describes what choices have been made to define this particular version. Each inclusion file is a compilation unit, i.e., it corresponds to functions and abstract data types that should be compiled together because either all the relocatable objects will be linked or none will be linked in a given executable image. Note that all dependencies on pathname syntax are concentrated within the inclusion directory, where, of course, the development system is determined. #include #include #include #include "Master/harmony/tools/bound/src/imagefmt/coffaux/symtab.h" #include "Master/harmony/tools/bound/src/bound.h" #include "Master/harmony/tools/bound/src/imagefmt/imagefmt.h" #include "Master/harmony/tools/bound/src/m680x0/m680x0.h" #include "Master/harmony/tools/bound/src/parse/parse.h" #include "Master/harmony/tools/bound/src/cleanup.c" #include "Master/harmony/tools/bound/src/getfunc.c" #include "Master/harmony/tools/bound/src/getlong.c" #include "Master/harmony/tools/bound/src/getword.c" #include "Master/harmony/tools/bound/src/lookahead.c" #include "Master/harmony/tools/bound/src/main.c" #include "Master/harmony/tools/bound/src/newfunrec.c" #include "Master/harmony/tools/bound/src/putfifo.c" #include "Master/harmony/tools/bound/src/readrecs.c" #include "Master/harmony/tools/bound/src/stack.c" #include "Master/harmony/tools/bound/src/stacksize.c"

Figure 1. Partial inclusion file for the tool bound for the A/UX development host.

Figure 2 illustrates a typical decomposition, that of the stack bounding tool. The inc directory has inclusion directories for three versions: the first is a version for the Motorola 680x0 family where development is done on Apple’s A/UX Unix system and the object file format is COFF, the second is a version for the Motorola 680x0 family where development is done using the Consulair Mac C system and the object file format is conventional Unix, and the third is a version for the Motorola 680x0 family where development is done on Unity under VMS and the object file format is Whitesmiths’. The src directory illustrates two abstractions, file specification and image format, each of which has implementation variants. Portability across targets appears in that there is a variant directory for the Motorola 680x0, and there would be ones for other processors if they were supported. The Motorola 680x0 directory is divided into abstractions depending on what aspect of the instruction set matters, and for the abstraction that is Harmony idioms, there are variants depending on the way external identifiers map to linkage editor symbols. The cel is implemented by file system trees parallel to the instantaneous tree just described. These trees might be on separate volumes, or they might be subdirectories of some encompassing directory. Although many layers are possible, and needed in some cases, most commonly there are three: a master layer containing the current mini-release, a working layer for a programmer containing source and inclusion files he has changed, and a derived layer containing output of derivers such as compilers, linkage editors, table builders, etc. Layers can be write-protected as required to retain the integrity of earlier

bound

inc

rc

cm68aux um68cmac wm68uvms

filespec

imagefmt

m680x0

Name bound.c

default

mac

coffaux

nix

whtsmths

harmony

bound.link bound.rsrc

default

boundext.c boundvar.c version.doc

Figure 2. Portion of the file tree for the tool bound.

macc

noprepend

6

versions while providing complete access to the code. working directory of the development system set to a Figure 3 shows the master layer for a simple application subdirectory in the derived layer when derivers are exeprogram. cuted, derived objects are kept in natural places while preserving the working and master layers as source only. The working and derived layers are parallel to the master The process of consolidation or update of the master layer in that in principle the directory structure is identi- layer is illustrated in Fig. 6. Inclusion files in the workcal, however they are sparse — only those directories ing layer contain pathnames that will be correct when the that are not empty are actually present. Additions of files working layer is consolidated, i.e., merged into the masand directories is obvious, because a directory or file is ter layer. perceived to be in the composed database if either it is in the appropriate directory in the working or derived layer, Support Tools for DaSC or if it is not, it is in the corresponding directory in the Part of the attraction of DaSC is portability across demaster layer. Deletion of files and directories is more velopment systems. Even on development systems with subtle: a special file type in the working or derived layers no automated tools, it can be effective through disci“pastes over” a file or directory of the same name in the plined use of manual procedures. Indeed, the methodolmaster layer to indicate that the file or directory should ogy was originally worked out manually, and we estabnot be perceived in the composed database. Figure 4 lished what tools would be useful by reflection on the shows the three cases of updating the state of a function manual procedures that had arisen. The most significant in the master layer. tools are tree manipulation tools. Tree-structured file sysAs stated above, the working layer contains the updates tems have been with us for a long time, but often direcfor the master layer. However, the selector file being tories have only been used to isolate related files from used during development is in the derived layer as shown unrelated ones. The novelty of tree manipulation tools in Fig. 5, selecting items from the master or working suggests that the computing community has made relatively little use of the subtree as an organizing structure. layer. The only tool commonly found is the ability found in Inclusion files in the derived layer contain pathnames Unix to walk a tree, executing a command on every file that point to files either in the working layer or in the found there; even that tool is missing from many other master layer, depending on where the file in the perceived systems. Actually, the ability to generate pathnames for composite object actually is. By having the current all files in a tree is sufficient, and indeed better, if one is

Master Master

applic

selector branch

inc

main.c applic.h extern.c

vrsn_1

applic.c

vrsn_2

applic.c

vrsn_3

applic.c

source branch

src

vrnt_a

fnc2.c

vrnt_b

fnc2.c

vrnt_c

fnc2.c

fnc3.c

vrnt_d

fnc1.c

vrnt_e

fnc1.c

selector file #include "Master/applic/src/applic.h" #include "Master/applic/src/extern.c" #include "Master/applic/src/main.c" #include "Master/applic/src/vrnt_b/fnc2.c" #include "Master/applic/src/fnc3.c" #include "Master/applic/src/vrnt_e/fnc1.c"

Figure 3. A layer view of Master for a simple application program.

7

Master orking

nc2.c nc2.c

(a)

The item fnc2.c is replaced by updated version in the working layer. Master Working

fnc3.c nc3.c

(b)

The item fnc3.c in the master layer is marked for deletion. aster Working

fnc4.c

a set of files, providing table of contents, page numbers, headers and footers on each page, concordance etc. The following tools are specific to the DaSC methodology. The first is a tool to generate putative inclusion files. Development often proceeds where the database changes without the inclusion files being updated simultaneously, for example because the developer was concentrating on some other version. Given an existing inclusion file in the master layer, the inclusion file wanted in the derived layer is probably the same except that the files are from the composed tree, i.e., files should be taken from the working layer if they are there, files in directories marked for deletion should be omitted, and new files in directories from which other files have been included should probably also be included. Automatically producing such a trial inclusion file, which can then be refined by manual editing, is quite useful. Producing the inclusion file in the derived layer (i.e., with each file reference to the appropriate layer) from the inclusion file in the working layer is simpler and useful if the developer has kept the latter up to date. Automatically producing the inclusion file for the working layer from that of the derived layer by changing layer references is also useful.

The second tool that is useful is a tool to generate make files for those development systems that use make files. (c) A new item fnc4.c is added to the working layer. Naive use of make [3] does not work because inclusion files in the derived layer are what are directly compiled, and changes in the composed view of the database are not necessarily indicated by timestamps of changes to these Figure 4. Three cases of updates in the working layer inclusion files. Manually trying to keep make files consuperseding items in the master layer. sistent with inclusion files and the cel mechanism is also unsatisfactory. However, a program that understands given the ability to execute a command on every file in a DaSC is able to generate make files that run the approlist of pathnames. Tools to find files or subtrees in a priate derivers. complex tree structure are invaluable. Several tools facilitate working with parallel trees: duplicating a directory The third tool is the conventional cross reference to idenstructure to produce a parallel tree, moving a file to the tify what might be affected by changes — except that corresponding directory in a parallel tree, setting the cur- here this dependency can be ascertained by examining inrent directory to be the corresponding directory in a paral- clusion directories to see what versions of which prolel tree, comparing parallel trees (i.e., comparing layers), grams reference files now found in the working layer. displaying the composed tree, and actually merging parallel trees. The current directory concept common in tree- The last tool is one to verify the audit trail. A log of structured file systems is inadequate; because there is of- which files changed and why is manually maintained by ten a forest of subtrees of current interest, we need short- the programmer as he works on his layer — often the cuts to refer to each of them. Visual displays of trees, same files are changed many times. Manual record keepwhich can be directly manipulated to add or delete files ing is error prone, so a tool to check that the log indeed and directories or to move or duplicate subtrees, aid com- corresponds to what is in the layer is reassuring. prehension and can actually improve productivity and reduce errors. Only a few tools have been needed that are In the Macintosh development environment, which we not general tree tools. An editor that can edit many files prefer, our tools are sometimes implemented as convensimultaneously, performing global searches and global tional programs, but they are also often implemented as substitutions, is a considerable asset when source code is macros in the QUED/M text editor. Some commercial fragmented to the level of granularity we use. Another products, especially MacTree Plus and HFS Navigator, valuable program is one to produce readable listings from are an important part of a complete tool set.

8

#include "Master/applic/src/applic.h" #include "Master/applic/src/extern.c" #include "Master/applic/src/main.c" #include "Working/applic/src/vrnt_b/fnc2.c" #include "Working/applic/src/fnc4.c" #include "Master/applic/src/vrnt_e/fnc1.c"

Derived

Working

selector file applic.c fnc4.c

fnc3.c

Master applic.c

fnc2.c

main.c applic.h extern.c

fnc3.c

fnc2.c

applic.c

fnc1.c

Figure 5. Selector file in the derived layer, updated items in the working layer.

Working

fnc3.c

nc4.c

Master applic.c

nc2.c dd elete

replace

replace

main.c

applic.c

pplic.h extern.c

fnc2.c

fnc3.c

fnc1.c

Figure 6. Consolidation of updated objects from the working to the master layer. The selector file in the working layer is identical to that in the derived layer, except that it points to the revised items in the master layer. Before consolidation it is non-functional, in that it may point to non-existent objects.

9

How DaSC Differs from Traditional Configuration Management

ants, needed for each specific version. Also, the duality of selector files, both as being selectors of the components of a configuration and as compilation units being No Check-out submitted to a compiler, fits well with the goal to use We consider it a management responsibility to ensure “commodity” personal computers and a variety of comthat projects assigned to programmers have minimal pilers for software development. overlap, so that update collisions and update omissions will be rare. Evolution occurs as a sequence of mini-re- File System Rather than Conventional Database leases. When an assigned project has been completed by Using the file system as a database has meant some frusa programmer, and it has been tested, it is then consoli- trating limits, such as the number of levels in the tree, dated with the master layer to form a new base. This may the total length of the absolute pathname, or the number actually involve merging the trees, or it may merely be of files in the tree. On the whole, however, it has proved using the composed image as if the layers were merged, a satisfactory choice. It has proved sufficient for reprebut the important issue is that the layers are consolidated senting the structures we need. Conventional databases in the same sequence for all users of the database. That do not offer much more. Conventional databases can repmeans that when a layer is consolidated, everyone else resent richer relations than trees, but files of pathnames must check that the layer being consolidated does not in- do too, and we have found the discipline of working with teract with the layer he is working on or correct the work a tree or forest useful in improving the conceptual orgain his layer to accommodate for any interactions that are nization of the database. (For example, shared subtrees found. Our experience is that such rework is minimal in often turn out to be software components that should be practice. Note that developers on different machines with promoted to a level where sharing disappears.) The items disjoint file systems need not actually perform the con- to be stored in the database are large and of varying size, solidation simultaneously just so long as when the con- and although object-oriented databases could handle them, solidations are performed they are done in the right se- conventional databases do not work well with variable quence. sized objects. (Of course, the database could contain toLong Developer Assignments

kens that are file names, where the variable sized objects are stored in those files, but then the database cannot guarantee access control.) Extraction directives, such as SQL boolean operations, are not useful in characterizing a selector set. Conventional databases provide automatic concurrency control, but we want to avoid that because it implies a shared file system. Some conventional database systems support distributed databases, but we finesse the distributed update problem which is what they solve. Database systems often facilitate storing typed objects, but we have found the file system adequate to store objects in files and to identify types by filename extension or by location in the tree. Database systems often provide backup and recovery mechanisms such as journaling and saved befores, but we find normal backup procedures with conventional file systems to suffice.

We observe that assignment of larger projects to individual programmers often has management advantages because projects demand domain-specific knowledge, and less manpower is spent bringing fewer programmers up to speed on a given project. Larger individual projects tend to mean reduced overlap among projects, which suits our configuration management scheme. The minirelease style of evolution allows a programmer to proceed at his own rate to complete the project before consolidation. We have observed that the locking associated with check-out/check-in configuration management often puts pressure on the programmer working on a large project to check-in when the project is only partially completed, simply to release locks that are interfering with the work of other programmers. This often causes bugs because these intermediate stages in his project are in- Conditional Compilation Banned consistent. Normal industry practice involves several techniques for sharing common code between programs: No Configuration Description Language • conditional compilation directives, which include or The DaSC configuration management system is not exclude a particular textual region based on a based on a syntactic description of configurations. There compile-time switch are two reasons for this choice. We have tried unsuccessfully to find some succinct set of attributes that could characterize the code needed for the different versions we actually maintain. However, the wide diversity of products in the board-level marketplace seems to defy such simple characterization, so we have been reduced to enumerating the particular components, and particular vari-

• include statements, which specify that the contents of a specified file are to be inserted at that point • macros, which define a common abstraction in which the program can be written but where the definition of the macros is different for each version

10

• procedures, which again provide a common interface Evolution in a Single Sequence with no Branches that defines an abstraction in which the program can be written but where the definition of the procedures It is not essential to the DaSC methodology that evolution be a single sequence with no branches. Indeed, the is different for each version progenitor model of the computer center retrofitting its • procedure name variables, e.g., configuration record changes to a new release of vendor software is a clear entries, which are similar to procedures except that case where this description is inadequate. However, our experience on our own projects is that the ability to supmore than one variant can be available at run time port many versions simultaneously subsumes many of • inversions, where the common code itself is put in the factors that lead other configuration management sysprocedures, which are then called from the version tems to represent evolution by a tree. The policy that specific code bug fixes are strictly issued as part of a new product reAll the techniques other than conditional compilation lease also helps eliminate branching evolution. This polhave the property that someone reading the set of ver- icy is not arbitrary but is an economic necessity in the sions, perhaps for guidance in producing a new variant or commercial mass marketplace where individual patches perhaps to attempt to identify further commonality, can cannot be shipped to individual customers. Such a polreadily locate the differences without having to examine icy, of course, implies sufficient software quality that adthe mass of common code. With conditional compila- equate workarounds can be found for the interim before tion, by contrast, the more successfully commonality the next release. has been exploited, the greater the mass of common code that must be examined to locate the differences. We have already indicated that code containing conditional compilation directives becomes quite unreadable when variants associated with several different factors interact. Interleaved conditional compilation directives are incomprehensible, and the code expansion if conditional compilation directives are not interleaved can be intolerable. We use all the techniques other than conditional compilation (with include statements strictly in inclusion files). We have tried using conditional compilation in inclusion files, where the objections cited are not so severe. The hope was that the number of distinct inclusion files that need to be manually maintained could be reduced. Our experience was that the benefits did not outweigh the increased awkwardness of creating and maintaining the inclusion files. Inclusion Files not Factored In defining a version, the inclusion files point directly at elements of the database. It might be argued that there are subversions that are common to several versions, and that these could be represented explicitly, with the inclusion files for any versions that use them pointing to the inclusion files for the subversion. One argument for this might be to reduce the number of inclusion files that need changing if the database is reorganized; another argument might be to facilitate the definition of a new version. We have not done this, in part because the explicit pathname syntax in inclusion files prevents one form of such sharing that would be useful (i.e., among otherwise identical versions configured for different development systems), and in part because components of the pathnames in the inclusion files have been adequate hints to suggest what might be in the inclusion file for some new version.

Progress Monitoring not in Configuration Management System Some configuration management systems incorporate features that purport to allow managers to monitor progress on projects. We have indicated that in some situations, such as projects built by consortia of companies, this is undesirable. In some situations, such as where developers are geographically distributed or where development is on freestanding personal computers, this is impractical. More generally, we think it is a bad idea. We know of no automated or semi-automated metrics that are useful for predicting how far a product is from completion, never mind how fast progress is being made. Most metrics, like lines of code added or modified, are worse than irrelevant because they have a deleterious effect on product quality as people suboptimize their behaviour to get better metrics. Completion of projects (milestones) is at least relevant, but gives little guidance as to how long the remaining work will actually take. We believe that to assess progress there is no replacement for professional judgement by managers, based on code reading, structured walk-throughs, and ongoing discussions with the programmers. Conclusions This paper presented an argument why integrated, monolithic configuration management systems are not well suited to commercial realtime systems. The paper introduced the concept of layering parallel trees as an abstraction model for evolving software. Layering of trees was shown to solve the vexing problem of how new versions can be introduced without invalidating earlier ones. The configuration management scheme presented here deliberately does not impose a specific database management system, but rather it uses the conventional tree-

11

structured file system as its basis. While a conventional tree-structured file system is not ideal for maintaining code databases, it is the most flexible and most generally available in all plausible development systems. The selector file concept was shown to offer many advantages. It permits explicit definition of specific configurations by enumeration of components. This approach has proven to be more flexible than schemes based on specification of attributes. Also, the selector file, which defines a particular configuration, is the unit that is submitted to the compiler, therefore guaranteeing consistency between intent and what is actually compiled.

Configuration Manager. In Berichte of the German Chapter of ACM, Vol. 30, ed. J.F.H. Winkler, Proceedings of the International Workshop on Version and Configuration Control, Grassau. January 1988. pp. 21–37. 5.

M.J. Rochkind. The Source Code Control System. IEEE Trans. Software Eng. SE-1(4): 364–370; 1975.

6.

W.F. Tichy. RCS — A System for Version Control. Software Pract. Exper. 15(7): 637–654; 1985.

The DaSC approach is undoubtedly not the only way to address the issues of commercial realtime software development, but it has proved more satisfactory than conventional configuration management schemes.

7.

W.M. Gentleman. Managing Configurability in Multi-Installation Realtime Programs. In Proceedings of the Canadian Conference on Electrical and Computer Engineering. Vancouver, B.C. November 3–4, 1988. pp. 823–827.

References

8.

W.M. Gentleman, S.A. MacKay, D.A. Stewart, and M. Wein. Using the Harmony Operating System, Release 3.0. NRC/ERA-377. National Research Council of Canada, Ottawa, Ont. February 1989.

9.

W.M. Gentleman, S.A. MacKay, D.A. Stewart, and M. Wein. An Introduction to the Harmony Realtime Operating System. Newsletter of the IEEE Computer Society Technical Committee on Operating Systems. Summer 1988. pp. 3–6.

1.

W.F. Tichy. Tools for Configuration Management. In Berichte of the German Chapter of ACM, Vol. 30, ed. J.F.H. Winkler, Proceedings of the International Workshop on Version and Configuration Control, Grassau. January 1988. pp. 1–20.

2.

DEC/CMS Code Management System, Digital Equipment Corporation, Product No. QL-007A#AA/LX.

3.

S.I. Feldman. Make — a Program for Maintaining Computer Programs. Software Pract. Exper. 9(3): 255–265; 1979.

4.

D.B. Leblang, R.P. Chase Jr., and H. Spilke. Increasing Productivity with Parallel

10. T.A. Cargill. Management of Source Text of a Portable Operating System. In Proceedings of COMPSAC ’80, IEEE Computer Society Fourth International Computer Software and Application Conference, Chicago, IL. October 1980. pp. 764– 768.