A New Approach to Consistency Control in Software Engineering

4 downloads 78 Views 96KB Size Report
that life-cycle model the organisation of configuration man- agement and consistency ... gram, selecting source files for the next release, or deciding about the ...
A New Approach to Consistency Control in Software Engineering  Georg Heidenreich, Mark Minas CS IMMD 2, University of Erlangen Martensstrasse 3 D 91058 Erlangen, Germany fheidenre,minasg@immd2. informatik.uni-erlangen.de Abstract Due to the growing size of projects, rising costs, and uncertainties of progress in development, the limitations of classical software engineering techniques have become obvious for many years now. Applying quality assurance methods to software development has been an answer since the beginning of the 90ies. But standards like ISO 9000 [Int91] and even customized quality assurance plans can only serve as a guideline; too many details depend on the project’s tasks, their distribution, the design methods, and programming languages for the given project. Experience in large projects shows that computer-based process management is an important contribution for implementing and supporting quality assurance methods. Therefore, we introduce a software process dependency representation and, during development, a labelling of software elements and product versions. Using these labels one can coordinate versions, variant designs and reconstruct elements of old versions automatically. Some applications are demonstrated for important problems of software project management which can not be solved or even detected with nowadays standard methods.

1 Introduction Analyzing recent work on project planning, configuration management, development processes, change tracking, and quality assurance gives the impression of research areas which grow towards a common understanding of managing the software engineering process. Many of these tasks can be supported by automated transformations. Therefore an integrated, parameterized representation of the meta-level of software development may be useful in order to model information flow and design history of any project.  This work has been supported in part by the German Federal Department of Research and Education (BMBF) under contract FKP 0004401B4A.

Detlef Kips BASYS GmbH Am Weichselgarten 4 D 91058 Erlangen, Germany [email protected]

Examining a given development project, one can abstract away from the specific design methods and formalisms used, such that a model of documents and tasks performed on these documents can be extracted. Within that life-cycle model the organisation of configuration management and consistency control can be automated, liberating the developer from some trivial (nevertheless important) tasks like finding the source files of the executable program, selecting source files for the next release, or deciding about the compatibility of code revisions. The tasks of configuration management have not been expressed in a formal way yet and there is no theory of evaluating or comparing configuration management systems. Thus, one goal of this paper is to introduce some formal concepts of configuration management. We suggest that there be a distinctionbetween a document name and its contents (revision) with respect to a specific software version (configuration). A single document typically has many revisions, which should be immutable once they are assigned to a configuration. A configuration management system should be able to... CM1 ... allow for coexisting variant revisions and configurations. This can only be accomplished by not only naming documents but also identifying revisions and configurations by unique labels. CM2 ... automatically derive a new revision of document D if the given configuration C contains revisions of all documents which D depends on. This situation requires knowledge of the way how documents are generated from other documents. CM3 ... determine which documents need to be changed in a configuration C, after a revision R of document D has been added to C. This is a very frequent situation and requires some representation of the way how documents depend on other documents. It also requires a relation determining whether revisions of depending

documents are consistent (i. e.: may be used in the same configuration). CM4 ... identify derived revisions if they have been derived using the same predecessor revisions. This can be done by using the labels of predecessor revisions for generating the label r of a derived revision R. Rebuilding a new revision using the predecessors already used for R would then generate the same label r. CM5 ... decide, whether a given revision R is part of a given configuration C. This is typically needed before a software product version is shipped to customers. If somebody wants to check for the inclusion of certain features in a given source code revision, one should be able to check whether this revision is part of the product configuration. Thus a CM system should provide a relation (contains) between configurations and revisions. CM6 ... extract the revision of document D which has been used in order to build configuration C. For that purpose, not only the contains relation, but also a mapping (belongsto) from revisions to their corresponding documents is needed. The configuration management system also has to provide a mapping defines from documents and configurations to revisions, giving the revision of document D which is actually contained in configuration C. Though being widely used tools, “make” together with Tichy’s revision control system “rcs” only supports the tasks CM1 - CM3 of our list [Tic85] [Fel79]. In this paper, we use an intensional approach in order to reason about compatibility of software elements and to give solutions for all tasks in the list. Configurations and revisions will be labelled and revisions of different documents are considered to be consistent, if their labels are compatible. We also compute the mapping belongsto and the relation contains using only the labels of revisions and configurations. Our basic assumption is that the labelling corresponds to the semantical dependencies between software revisions. Therefore, the construction mechanism for the labels assumes that the revision derived from another revision R of a different document is semantically consistent to R. We also show that comparing a software project to a distributed computation reduces the problem of building a consistent product version to obtaining a snapshot of a consistent global state. The model of vector clocks, as developed independently by Fidge and Mattern, provides a solution for the problem of deciding consistency of global states [Fid88] [Mat88]. Our original contribution is to apply vector clocks to software projects. Using vector time stamps as versions labels, we will describe a theoretically founded

model for intensional software configuration management. We also give some examples how this method may be used in a development environment. In a real-world software engineering environment, constructing and using vector time stamps would completely be hidden to the developers, just like the mechanism of the “last-modify-date” of files is not apparent when the “make”-tool selects appropriate steps for building software.

2 A project model The information in a “makefile” describes a software project by a set of unique document names and, attached to each name D, a description which documents D depends on, together with a statement defining the way how D is derived from these documents. Each rule within a “make”script starts with the target document Di and (after a “:”) the list of documents on which Di depends. In order to represent the dependency information between documents in a life-cycle model we introduce the dependency vector of a document. The dependency vectors describe the dependency as given by the scripts of “make”, but they can also be used to describe the use of knowledge about documents when editing another document manually. Be N the constant number of documents in a given project. Each dependency vector has length N where position i = 1 2 N corresponds to document Di , describing the semantical dependency between documents. For each document Di there is a dependency vector which is assumed to be constant throughout the project and represents the flow of information between documents. If the reflexive, transitive closure of the dependency relation is represented by a matrix, the dependency vectors are just the columns of that matrix.

; ;:::

Definition 2.1 Let N be the number of documents in a project, then d j is the dependency vector of length N belonging to document D j . Supposed, document D j is derived from or (even indirectly !) uses informationof document Di , then d j [i] := 1 (of course di [i] = 1 holds). On the other hand, if any changes in Di do not affect D j d j [i] := 0. Note that the definition of dependency vectors in a given project is axiomatic and describes inherent semantical properties of that project. In order to give examples, we will use a graphical representation of the project information. We will represent the dependency vectors by a deponedge between documents, which are represented by nodes. Note that in the figures information flow is from left to right and that the dependency arrows are drawn in the reverse direction. Revisions are also represented by nodes and connected to their documents by a belongsto edge. Let us

n

(1,0,0,0) = d1

  

 AK depon A A  A  A  A 

n

D1

KA A

A A A

Consistency in a configuration may be defined informally by the following rules:

D2

(1,1,0,0) = d2

1. A configuration has at most one revision per document and thus may be represented as a partial function. depon

A A

D4

depon

A A  A  A 

(1,0,1,0) = d3

   

A

n

 depon

(1,1,1,1) = d4

n

D3

Fig. 1 A graphical project model

assume that the dependency and processing information of an example project has been defined using something like a “make” script. Figure 1 shows a small example project with the dependency vectors represented by both numerical values and depon-edges. A document only describes the role of a software element within the project, the specific contents of a document are its so-called revisions. There are typically many revisions for each document. Revisions represent historical progress and variant features of software elements. As we will see in this paper, the dependency vectors are used when examining compatibility of revisions used when building a software product version. Each revision R will be associated with a vector containing dependency information collected from all revisions of different documents which have been used in order to build R. Revisions of different documents belonging to one version of the product under construction make up a configuration. Speaking in terms of a multi-valued database, one object (document) can have multiple values (revisions), one of these is selected by the value-function (configuration). During development, configurations may also be partial, i.e. they do not necessarily contain revisions for each document in the system. But since the set of revisions must be consistent, it reflects the structure of the development process. We can show that compatibility can be guaranteed from an intensional point of view if the vectors of all revisions in a configuration have equal values (unless zero) in their components (see lemma 3.3).

2. If a configuration contains a revision for a document D j , it must also contain revisions for all documents Di which D j depends on. As a consequence of these two rules, in a given configuration, one can only add revisions to subsequent documents in the life-cycle. 3. In the same configuration, any two revisions Ri , R j belonging to documents Di , D j which are semantically dependent, have to be compatible to each other. The concept of configurations was introduced to represent visibility scopes and compatibility of persistent file versions. Configurations link together compatible versions of software elements, separating the development history from the life-cycle model. In order to develop new versions cooperatively, new mechanisms of revision transfer have to be introduced which allow group members to share latest data in a consistent view onto the project database. A developer can only add consistent revisions, and once a configuration contains a revision, the latter one may not be replaced or deleted. Thus configurations are the units of recovery and consistency, allowing for the reconstruction of any product version. Modeling change means creating another configuration where at least one revision is new, whereas unchanged revisions can be copied from an ancestor configuration. Configurations may now be defined using the informal properties given above: Definition 2.2 Let D be the set of documents in a project, and R be the set of revisions. Then a configuration is a partial mapping C : D R with A configuration is closed under the dependency relation: 8Di D j : (9Ri : C(Di) = Ri ^ di[ j] = 1 ) 9R j : C(D j ) = R j ) All document revisions are compatible to each other: 8Di D j : C(Di ) = Ri ^ C(D j ) = R j ) Ri is compatible to Rj

; ;

*

The term compatible refers to semantics inherent to the project and the formalisms used for Di and D j . These are not considered here, instead we will provide an intensional definition of compatibility in the sequel. A mapping of all documents into revisions is called a cut, thus it represents a complete description of a system state, as opposed to a configuration, which describes consistent revisions of a subset of all documents. Definition 2.3 Let D be the set of documents in a project, and R be the set of revisions. Then

C : D ! R with C(D) = R ) R is a revision of D is called a cut.

1 and with R j m being labelled with the compatibility vector r j m , then the new value for the vector clock of Di will be: ;

;

8k : CLKi [k] := max (CLKi [k];:::r j m[k]:::) jm 0

A set of consistent revisions of all documents is called a consistent cut and is equivalent to a complete configuration of a software product.

3 Consistent configurations

;

0

;

Definition 3.1 The compatibility vector ri n of length N to each revision Ri n (n being the index of an arbitrary revision of Di ) is constructed using the vector clock CLKi with the following procedure (1  i  N, 1  k  N): ;

;

1. The initial state, before starting to build revisions at all, is

8i; k : CLKi[i] = 1 ^ (i 6= k ) CLKi[k] = 0)

2. When document Di is edited and revision Ri m is created, the local time of Di will be incremented and the new value will be used to label Ri m with a compatibility vector ri m of length N: ;

;

;

CLKi [i] := CLKi [i] + 1 0

CLKi [ j ] := CLKi [ j ] with i 6= j 0

ri m := CLKi . 0

;

3. When a new revision Ri n of document Di is derived using revisions R j m of each document D j with di [ j ] = ;

and this will be used to label the new revision Ri n with a compatibility vector: ri n := CLKi .

This section explains a technique for determining consistency of configurations. We will outline the relation of Mattern’s method of vector clocks to the description of software projects [Mat88]. Comparing the life-cycle model given by a system of dependency vectors to a distributed computation, the documents can be regarded as processes with changing internal states which correspond to revisions. The state transitions are inherently concurrent and are a good description of developing items of a large system in parallel. Like in an asynchronously coupled network of engineering workstations (or desktops), a distributed computation only assumes asynchronous communication and, as a consequence, the absence of global time. (Asynchronous communication is not useful for synchronization purposes, because transmitting timing information itself takes an a priori unknown amount of time.) When determining consistent cuts, the method of vector clocks[Fid88], [Mat88] can be applied to building configurations. As we will point out in this section, vector timestamps turn out to be a special description of compatibility, so these vectors will be called compatibility vectors. Compatibility vectors are associated with each revision R and are created by copying the value of the corresponding document’s local vector clock at the time of creation of R.

;

;

;

Let N be the number of documents in a project graph, the vector clock CLKi of document Di is a vector of length N with the following properties:  CLKi[i] is the local logical time of Di and  ri m[i] 0 was the local logical time when revision Ri m of Di was created  CLKi[ j] was the local time of D j when the latest revision R j m of D j was created which influences the current state of Di . Instead of N clocks, one for each document, also a single clock for the whole system may be used. Furthermore, instead of logical clocks any built-in system clock or any other counter can be used as well, provided the counter is incremented each time a revision stamped by that counter is generated. Due to the construction mechanism of vector clocks, any compatibility vectors of a given document D j will have a non-zero value in any positions i which is assigned to a preceding document Di . This property is closely related to the dependency vector d j of D j :

>

;

;

;

Lemma 3.1 The dependency vector d j of document D j satisfies: d j [i] = 1 if and only if r j [i] 6= 0 d j [i] = 0, otherwise for arbitrary compatibility vectors r j of revisions of document D j . Proof by induction: The compatibility vectors of documents di without any predecessors (8 j 6= i : di [ j ] = 0) only have their own local clock value ri [i]. Let lemma 3.1 be true for all documents d p from which dq is derived (dq [ p] = 1), then (d p [m] = 1 ) r p [m] 6= 0) and, by the definition of dq , also (d p [m] = 1 ) dq [m] = 1) holds. In the derivation rule for the vector clock CLKq , exactly those revisions of the documents d p are used in order to determine the subsequent vector clock value of CLKq . Thus rq [m] 6= 0 , (q = m) _ r p [m] 6= 0) , dq [m] = 1 2

:

Lemma 3.2 In any collection of vector clock values (snapshot) made at a single instant of global time, the local time CLK j [ j ] of any document D j is greater or equal to the value of CLKi [ j ] in the vector clock of some other document Di : 8i j : CLK j [ j]  CLKi[ j].

;

Proof trivial. 2 Note that, due to the properties of distributed systems, global time in general can not be observed and such a snapshot in general can not be constructed. However, in a realworld distributed development environment, a global time synchronization can be added in order to obtain snapshots of the environment. Using compatibility vectors one can determine whether a revision R j m has been derived using Ri n , which is important when building a consistent cut. Supposed, the corresponding document D j depends on Di (d j [i] = 1). In a consistent cut, all the clock values Ri n knows about, have to be the same as in R j m . On the other hand, there may be clocks which D j sees, but which Di cannot see. Thus, with compatibility vectors, we can determine consistency and inconsistency: ;

;

J belongsto (a,b,0,0) = r2

R22

(a,0,0,0) = r1

J

J belongsto

D1

A K A A

R11

;

Definition 3.2 Causal relationship between revisions (!),  k  N): Ri n ! R j m :, 8k : (r j m [k] = ri n [k] _ ri n[k] = 0) ;

;

;

;

;

In general, two revisions in a consistent cut have to be derived from identical revisions of any document they both can see. Component i of all compatibility vectors in a consistent cut may only have one value r[i] (except of zero). Thus, two revisions are consistent, if and only if their compatibility vectors have equal values in those components which are non-zero for both vectors: Definition 3.4 Consistency between revisions ($), (1  k  N): Ri n $ R j m , 8k : ((r j m [k] = ri n [k]) _ (ri n [k] = 0) _ (r j m [k] = 0)) ;

;

;

;

;

;

Of course, for two arbitrary revisions Ri and R j of different documents Di and D j : (Ri ! R j ) _ (R j ! Ri ) ) (Ri $ R j ) holds (but not (!).

;

Lemma 3.3 Let C be a cut, and Di and D j arbitrary documents (1  i j  N): C is consistent , 8i j : (C(Di ) $ C(D j ))

;

A A

D4

depon

A A  A  A  -

J belongsto

   

n

 depon



(a,b,c,1) = r4 = c

belongsto

J R42

n

D3

R31

;

Definition 3.3 Let C be a cut, and let Di and D j be arbitrary documents (1  i j  N): C is consistent :, 8i j : (d j [i] = 1 ) (C(Di ) ! C(D j ))).

;

(a,0,c,0) = r3

D2

K  A depon A A  A  A depon  A  A  A  A 

n

;

(1

n

Fig. 2 Generating vectors

) C(Dk) ! C(Di) ^ ri[k] 6= 0 ^C(Dk) ! C(D j ) ^ r j [k] 6= 0

(due to definition 3.3 and lemma 3.1) ) ri[k] = rk [k] ^ r j[k] = rk [k] (due to definition 3.2) ) ri[k] = r j [k]

;

Thus we conclude that C is consistent if and only if (8i j 8k : ri [k] = 0 _ r j [k] = 0 _ ri [k] = r j [k]) which is equivalent to (C(Di ) $ C(D j )) by definition 3.4.2 Figure 2 shows a set of consistent revisions in our example project. Each revision Ri of any preceding document Di used in order to build a revision of another document is identified by the local identifier ri [i], and the composition of these revision identifications makes up the compatibility vector. Since a consistent cut describes a set of revisions which have same component values in their compatibility vectors, one can also define a compatibility vector for the consistent cut:

Proof: Let C be a consistent cut and ri and r j the compatibility vectors of C(Di ) and C(D j ), respectively. For each position k in the vectors, at least one of the following statements is true:

Definition 3.5 Compatibility vector c of a consistent cut C is defined by: c := supfr1 rng with supfr1 rng[k] = max(r1 [k] rn[k]) and ri being the compatibility vector of C(Di ) for any document Di .

a) di [k] = 0 ) ri [k] = 0, see lemma 3.1. b) d j [k] = 0 ) r j [k] = 0, for the same reason. c) di [k] = 1 ^ d j [k] = 1

Lemma 3.4 The compatibility vector of a consistent cut consists of the local logical clock values of its revisions. 8k : c[k] = rk [k] 1  k  N

;:::;

;:::;

;

;:::;

Proof by applying lemma 3.2 componentwise. Let i 6= j, c[i] = r j [i] and c[i] ri [i] then r j [i] ri [i] would hold, contradicting lemma 3.2. So, ri [i] is exactly the maximum of component i in all compatibility vectors of the consistent cut. 2 The definition of causal relationship and consistency can be extended to compatibility vectors of cuts, too. The compatibility vectors can easily be used to identify revisions and determine both dependencies and compatibility of arbitrary revisions. The principle of the “make” tool can be generalised for both the resolution of access conflicts and the reconstruction of historical configurations. In either situation the solution is the consistent completion of revision subsets, as we will demonstrate in the next section.

>

>

Lemma 3.5 A consistent cut C is causally related to each revision in C. 8i : C(Di ) ! C. Proof: Let Ri = C(Di ) and R j = C(D j ). Let also ri , r j be the compatibility vectors of revision Ri and R j respectively. Due to lemma 3.3 Ri $ R j holds, which can be transformed to: 8 j : (ri [ j] 6= 0 ^ r j[ j] 6= 0) ) ri [ j] = r j [ j]. The local time of a revision is always a positive value: r j [ j ] 6= 0 and due to lemma 3.4 r j [ j ] = c[ j ], thus we can transform that statement to: 8 j : (ri[ j] 6= 0 ) ri [ j] = c[ j]) which is the definition of Ri ! C 2 The definitions and concepts introduced so far are based on the vector clocks developed independently by Fidge and Mattern [Fid88], [Mat88]. Our application to a dag (directed acyclic graph) of documents is a special case, with component values of any consistent cut being the same in each revision which is part of that cut. The rest of the paper describes some entirely new ways to configuration management based on vector clocks. Now we describe the construction of consistent cuts, but we also allow for incomplete revision sets, i. e., we are using the term configuration for a partial mapping of documents into revisions, which can also be described by a compatibility vector. The definition of causal relationship between revisions ! may be extended to be applied to compatibility vectors of configurations.

:

Proof for ): Because of lemma 3.4, in a consistent cut 8 j : c[ j] 0 holds and (c  d )[ j] = 0 , d [ j] = 0, so c  d describes the vector of some revision of Di (see lemma 3.1) and (c  d )[i] is the local time of Di when the corresponding revision was created. There is only one revision with that local time, because each time another revision is created, the vector clock of Di is incremented. As a result, it would be sufficient to compare (c  d )[i] with r[i]. If c is the vector of an incomplete configuration, 9k : c[k] = 0, but for these k also d [k] = 0, because generating revision R is not possible if document Dk is needed (d [k] = 1), but not present in C (c[k] = 0). Thus c[k] = 0 ) d [k] = 0 and our initial assumption (c  d )[ j ] = 0 , d [ j ] = 0 also holds for incomplete configurations. Proof for (: Assume that C(D) = R, then lemma 3.5 guarantees R ! C, which (by definition 3.2) is equivalent to 8 j : r[ j] 6= 0 ) r[ j] = c[ j]. Lemma 3.1 says that 8 j : r[ j] 6= 0 , d [ j ] = 1 thus 8 j : d [ j ] = 1 ) r[ j ] 6= 0 ) r[ j ] = c[ j ] ) r[ j ] = (c  d )[ j ]. 2 We will now define how the compatibility vector of a subsequent configuration is derived from the current compatibility vector, after a revision has been replaced or an additional revision has been added, respectively.

>

0

Definition 3.6 Let C be a configuration, which has been developed changing configuration C by replacing revision R with R for document Di . The new compatibility vector c for the subsequent configuration C has the values: 1. c [ j ] := r [ j ] if i = j. 2. c [ j ] := c[ j ] if d j [i] = 0. 3. c [ j ] := 0 otherwise, because all documents depending on D j do not have any informationabout R at that time. 0

0

0

0

0

0

0

Lemma 3.6 Let C be a configuration which has been developed changing configuration C by adding revision R j for document D j .(So we assume c[ j ] = 0.) Let r j be the compatibility vector of revision R j . Then, the new compatibility vector c for the subsequent configuration C has the values: c [i] := r j [i] if i = j and c [i] := c[i] otherwise. 0

0

0

0

Proof: According to definition 3.6 the subsequent configuration has a compatibility vector with the values: 1. c [i] := r j [i] if i = j. 2. c [i] := c[i] if di [ j ] = 0. 3. c [i] := 0 if di [ j ] = 1 ^ i 6= j. In case 3), due to our assumption c[ j ] = 0 the condition di [ j ] = 1 ) c[i] = 0 holds because in c there must not be a revision from a document Di depending on D j , which has no revision in C at all (c[ j ] = 0). Thus it is sufficient to define: c [i] := c[i] if i 6= j 2. 0 0

Theorem 3.1 Let d be the dependency vector of document D and c, r be the compatibility vectors of the configuration C and revision R of document D, respectively, and let  be the componentwise product. Then we can decide, whether a given revision R is included in the configuration C by comparing the compatibility vectors: c  d = r , C(D) = R.

0

0

0

0

4 Using compatibility vectors This section refers to the list in section 1 and explains how compatibility vectors provide a solution to the configuration management tasks listed up in the introduction. CM1 Compatibility vectors can be used to distinguish revisions, each vector uniquely identifies one revision. In a project database, the vectors may be used as a composite key. Thus, variant revisions for a single document are not confused. With respect to the same document Di , different revisions can be distinguished with the local time stamp ri [i], and revisions of different documents according to lemma 3.1 always have different zero positions in their vectors. As a result, compatibility vectors of revisions are unique within a whole project. CM2 Assume that document D j is generated using Dk and Di . When a version is built and two revisions, say Rk and Ri , are created, the question arises whether these revisions are semantically consistent (Ri $ Rk ) so that they can be used when deriving some revision R j of D j. The problem of checking for the compatibility of revisions of different documents in order to use both as input to one task for a common successor revision is shown in figure 4. In this example revisions with the vectors (a b + 1 0 0) and (a 0 c 0), respectively, are compatible because the only component which is nonzero in both vectors is the first one. It has the same value in both vectors, so the corresponding revisions can be used to build a successor labelled (a b + 1 c d ) with d = 1 being the local time stamp of document D4 , which is not incremented, because in a automatic derivation the construction mechanism of compatibility vectors never uses rule 2 (in def. 3.1) and CLK4[4] remains at its initial value.

;

;;

;;;

;

;;

CM3 Supposed, as a result of some test or inspection, document D in an existing product version has to be changed. Consequently, a different version of the product has to be built, which does not only contain the new revision of D, but also those new revisions of all documents that depend on D. Thus, it has to be determined which subsequent steps are necessary to complete the product version including the changes made to D. Figure 3 shows the resulting configuration of changing a revision of document D3 . According to definition 3.6, there is no revision for D4 . In general, the revisions to be deleted after changing the revision of D j are corresponding to the documents in the j ?th row of the dependency matrix.

J belongsto 0

(a,b+1,0,0) = r2 R22

(a,0,0,0) = r1

J

J belongsto

D2

K  A depon A A  A  A depon  A  A  A  A 

n

R11

(a,0,c,0) = r3

n

D1

A K A A

A A

D4

depon

A A  A  A  -

J belongsto

   

n

 depon

n

D3

R31

Fig. 3 Modifying a revision

CM4 On the other hand, revisions derived automatically and from identical predecessors will be labelled with identical compatibility vectors. In an automated derivation using a compiler or linker, the local time stamp is not incremented and the only information is taken from the revisions used as inputs (see fig. 4). CM5 One can also examine a system and check whether a given revision is included. If somebody wants to know whether certain features are included in the product version or not and an answer is only possible knowing the source code, the inclusion of a certain software version R in configuration C has to be checked. Using lemma 3.5 the relation contains may be defined as: contains(R C) :, R ! C.

;

CM6 Imagine that typical situation after a software product has been shipped: Change requests related to bug reports from customers operating a known program version C are the feedback for the elements used to build C. But which elements have been used when building C ? Identifying revisions of a document D1 within a given system version only requires the reconstruction of the compatibility vector as shown in theorem 3.1. The result of the multiplication can be used to identify the revision of D1 which has been used when building the given system version.

;

defines(D C) = R :, d  c = r using the componentwise product .

n

J belongsto D2 K A  (a,b+1,0,0) = r2 R22 depon A A  A  A depon  A  A  A  (a,b+1,c,1) = r4 = c (a,0,0,0) = r1 A  

J belongsto J D1 D4 belongsto AK  R42 R11 A depon A  A  A depon  A c = (a,b+1,c,1)  A  A  A  (a,0,c,0) = r3 J belongsto D3 0

n

n

0

0

n

R31

Fig. 4 Rebuilding the system 0

;

;;

In figure 4 the vector c = (a b + 1 c 1) identifies each element of the software version. In the compatibility vector of the revision of document D2 used when building the current revision of D4 , only the first two components have non-zero values. So, by inserting zeroes into the third and fourth position, (a b + 1 0 0) is reconstructed, which uniquely identifies the revision in question in the whole project history and among all documents of the project.

;;

;

The mapping belongsto is defined using lemma 3.1:

belongsto(R) = D :, (8k : r[k] = 0 , d [k] = 0)

5 Related work Westfechtel explains revision management and consistency control using a two-level approach with (coarse- and fine-grained) graph representations in an integrated software development environment [Wes91][Nag93]. He describes very useful enactable implementations with a flexible multi-level architecture [Wes91] [Wes94]. Due to the assumptions about languages, fine-grained constructs reduce the practical use of process models. We do not restrict our graph model to any representation or file format which would be an inevitable consequence of fine-grained representations. Instead, we use coarse-grained modeling in a very general approach. Zanzinger has demonstrated the general applicability of coarse-grained representations [Zan92]. Our first proposal for a generic coarse-grained process model for quality assurance purposes can be found in [KH95]. In that paper, we

0

introduce project flow graphs as a general and flexible notation for modeling and comparing configuration management systems. With ADELE, Estublier presented a version management system with an intensional mechanism for building configurations [Est94]. The selection scheme is more general because it allows for dynamic dependency schemes and for imposing general predicates on versions labels. However, ADELE only supports one stage of the software life-cycle (selecting source code versions) and will not identify objects or executables nor reconstruct their sources. Deciding about consistency of global states in distributed systems using vector clocks has been independently developed by Fidge [Fid88] and Mattern [Mat88]. A good introduction to vector clocks can also be found in the book of Singhal and Shivaratri [SS94]. All authors only describe applications to distributed processing in general. Our paper applies vector clocks to the software engineering process and presents a (re)construction mechanism describing consistency of product versions. As far as we know, there has not been any application of vector clocks to the management of technical development processes yet.

6 Conclusion We have presented a formal model for various configuration management (CM) activities which is independent of particular CASE methods, tools or life-cycle models. We separated the concept of a filename (document, abstract part of a technical system) from the concept of a file content (revision, snapshot of a document during development) and only allowed for monotonous revisions sets, the so-called configurations, which make up the product versions. We used an intensional approach and defined a labelling technique which is able to solve typical CM problems in technical development projects. The concept of compatibility vectors defines a mapping between revisions and configurations. Compatibility vectors can automate some of the (up to now: manual) tasks of configuration management, thus removing the cause of many typical errors within software engineering projects. We suggest that the method proposed in our paper be used to modify “make” such that manually edited documents can be included in a “makefile”, too. Additionally, variant configurations of the same product should be managed in parallel. In order to support change tracking, the identification of source revisions must be possible in such an improved “make”-tool.

Acknowledgements We would like to thank Prof. Dr. H. J. Schneider, chairman of the institute for programming languages and compilers at the University of Erlangen-Nuremberg, for sup-

porting our research. We are indebted to Bernhard Westfechtel at RWTH Aachen for his valuable comments on our work. We also thank BASYS GmbH, Erlangen and the German Federal Department for Research and Education for the financial support.

References [Est94] J. Estublier. The adele configuration manager. In Tichy, editor, Configuration management. John Wiley & Sons, New York, 1994. [Fel79] S. I. Feldman. Make - a program for maintaining computer programs. Software - Practice and Experience, 9(4), pp.255 – 265, 1979. [Fid88] C. J. Fidge. Timestamps in message-passing systems that preserve partial ordering. In Proc. 11th Australian Computer Science Conference, pp. 56– 66, february 1988. [Int91] International Organization for Standardization. ISO 9000, part 3. Geneva, 1991. [KH95] D. Kips and G. Heidenreich. Project flow graphs a meta model to support quality assurance in software engineering. In FUCaM, editor, Proc. Int. Conf. on Industrial Engineering and Production Management ’95, volume 1, pp. 347–357, Mons, Belgium, april 1995. FUCaM. [Mat88] F. Mattern. Virtual time and global states of distributed systems. In M. Cosnard, editor, Proc. Parallel and Distributed Algorithms, pp. 215–226, 1988. Reprinted in: Z. Yang, T.A. Marsland (eds.), ’Global States and Time in Distributed Systems’, IEEE, 1994, pp. 123-133. [Nag93] M. Nagl. Eng integrierte Software Entwicklungs Umgebungen (Tightly integrated software development environments). Informatik Forschung und Entwicklung, (8), pp.105 – 119, 1993. [SS94] M. Singhal and N. G. Shivaratri. Advanced Concepts in Operating Systems. McGraw-Hill Series in Computer Science. McGraw-Hill, New York, 1994. [Tic85] W. F. Tichy. RCS - a system for version control. Software - Practice & Experience, 15(7), pp.637– 654, july 1985. [Wes91] B. Westfechtel. Revisions- und KonsistenzKontrolle in einer integrierten SoftwareEntwickungsumgebung (Revision- und consistency control in an integrated software development environment). Number 280 in Informatik Fachberichte. Springer, Berlin, 1991.

[Wes94] B. Westfechtel. Using programmed graph rewriting for the formal specification of a configuration management system. In Proc. Workshop on Graphtheoretic Concepts WG 94, number 903 in Lecture Notes in Computer Science, pp. 164–179, Berlin, 1994. Springer. [Zan92] M. Zanzinger. ROSE - Konzeption und Implementierung einer Programmierumgebung fuer rechnerorganisierte Software-Entwicklung. PhD thesis, Universitaet Erlangen-Nuernberg, 1992.

Suggest Documents