2014 11th International Conference on Information Technology: New Generations
GALO: A Semantic Method for Software Configuration Management Juliano de Almeida Monte-Mor
Adilson Marques da Cunha
Advanced Campus of Itabira Federal University of Itajuba - UNIFEI Itabira/MG, Brazil
[email protected]
Computer Science Division Brazilian Aeronautics Institute of Technology - ITA Sao Jose dos Campos/SP, Brazil
[email protected]
If some artifacts changes are not properly propagated where they are replicated, this may result on inconsistencies within the storage media taking long time to be discovered.
Abstract— During the software development several interrelated artifacts are created. Despite the common version control of artifacts by existing Software Configuration Management (SCM) tools, current mechanisms do not provide full support for reusing and changing propagation between artifacts. This occurs because they are focused mainly on source-codes and their compilations. Few efforts have been made in order to provide suitable versioning of artifacts in structured or semi-structured formats. This paper proposes a method named GALO, which seeks to provide an automated knowledge configuration management within Software Engineering projects. In order to validate the proposed method, we developed a case study, involving both the creation of composed artifacts and the traceability of changes between artifacts.
The problem addressed in this research considers that current SCM methods do not provide appropriate support to artifacts reuses, considering finer version granularity. Besides the inherent interoperability and integration of Semantic Web standards, the application of Semantic Web technologies in Software Engineering projects can also provide some formalism for reuses and automated controls of changes impact between versioned artifacts. Within this context, this research proposes a new method for the SCM, based upon Semantic Web architecture and standards, in order to provide more efficient knowledge configuration management, by enabling verifications, validations, inferences, and new knowledge discovery, during software development processes. This study has considered the configuration control of artifacts described mainly through RDF and OWL languages.
Keywords-Software configuration management, semantic web, reusability.
I.
INTRODUCTION
Software product development usually involves complex inter-related artifacts such as requirements, designs, test cases, source-codes, compiling scripts, and others. These artifacts are constructed through interactions between analysts, developers, testers, and others, often through file share repositories.
II.
Despite the obtained advantages applying common Software Configuration Management (SCM) tools for artifacts versioning, current approaches do not consider in their versioning structures the semantic expressed by relationships between artifacts. This occurs mainly because the real focus of these tools remains restricted and guided by source-codes development and compiling processes.
Shahri et al. [6] also have proposed a new formalism for configuration management, based on approaches developed from the knowledge representation field. From their work, components and versions restrictions were encoded on OWLDL ontology, facilitating the knowledge sharing, regarding configurations across multiple systems.
According to Hoek et al. [1], a SCM system needs to provide support for relationships models, considering artifacts versions. The absence of an appropriate architecture for storage and retrieval of parts or elements described within artifacts prevents to correctly relate these elements to other artifacts, limiting data, information, and knowledge reuses and shares.
Guerrieri [7] have presented an approach for the context of Software Engineering projects considering software artifacts reuses and XML format to provide an appropriate framework reuse.
System documentations may contain inconsistencies, as for example, about where the information reuse was manually done with duplication of the interesting information on several artifacts (e.g., on business rules, requirements, use cases, and others). Changes in requirements may require various artifacts update, such as text documents, diagrams, source-codes, and others. 978-1-4799-3187-3/14 $31.00 © 2014 IEEE DOI 10.1109/ITNG.2014.66
RELATED WORKS
Zeller and Snelting [2] have proposed a unified versioning, seeking integration between SCM tools and processes. While Kitcharoensakkul and Wuwongse [3] have introduced a unified representing and computing model for SCM, based on the RDF language and equivalent transformations computational model [4], in addition to the application of RDF programs [5] to build some computational and reasoning mechanisms.
Ambrosio et al. [8] specified a tool applying ontologies to produce up-to-date documentations. The proposed approach has considered two ontologies: one for the artifacts 33
In this case, ca and cb refer to a and b components from the product P and this relation is considered irreflexive, asymmetric, and transitive.
classification structure and other to describe concepts by establishing semantic relationships between documents and their contents. In this case, the artifacts serve as a connection between these ontologies.
This corresponds to a whole-part relationship type. In this case, unlike the composition, the part does not exist without the whole. It allows reuse (include) components to build new components but without control their life cycles. The components are aggregated into a read-only mode, where any changes should be made by directly accessing them.
Already Nguyen et al. [9] described the application of an object-oriented infrastructure for SCM, called Molhado, on the construction of a multi-level SCM system to control sourcecodes and structured documents. On this work, the XML language has also been adopted as a standard format for describing structured documents.
The aggregation relationship allow the construction of a components hierarchy, where leaf nodes represent atomic components without aggregation relationships, and intermediate nodes refer to compounds components, containing at least one aggregation relationship. Figure 1 illustrates the allowed relationships within the proposed product space.
Arantes and Falbo [10] have proposed an infrastructure for managing semantic documents based on semantic web technologies, enabling: addition of metadata annotations on documents; data evolution traceability; versioning of extracted data from semantic documents; and changes notifications for these data. According to Kitcharoensakkul and Wuwongse [3], beyond the control of the software objects evolution, it is also necessary to control their inter-relationship versions. Typically on this work, graphs AND/OR [12] are adopted for representing these relationships. Here, the authors have classified the relationships between objects into two types: composition, as a relationship between software products and their components; and dependency, as a relationship between software products components, which contents must be consistent with their masters’ contents. In [11], Kitcharoensakkul and Wuwongse also have introduced another approach to software composition based on constraints and recovery items, from where users specify desired components’ names and properties. After retrieving the interest items, these are confronted with constraints in order to assess potential conflicts. III.
Figure 1. Relationships example in the product space.
In this example, the product P1 is composed of three components {C1, C2, and C3}. The content of C2 is attached to C1, in order to complete its formation. Moreover, it is also set up an explicit dependency between components C1 and C3.
THE GALO METHOD DESCRIPTION
The GALO method versioning model is described as follows in terms of its product and version spaces [12], by tackling the organization and versions of a software product component. The proposed method provides an automated, flexible, and optimized configuration management (in Portuguese: Gerencia de configuracao Automatizada, fLexivel e Otimizada – GALO).
The dependency relationship between C1 and C3 states that indirectly changes in the content of C3 cause impact in the content of C1. This also implicitly occurs through the aggregation relationship, where changes in C2 also directly provoke changes in the content of C1. The composition and aggregation relationships provide modularity and encapsulation in the components construction. However, composite components need to be seen as units by users, in a transparent way. Thus, they should be delivered as complete modules, containing both its particular and the aggregated contents.
A. Product Space Considering the product space, one can define a software product P as an interrelated component set Ci, with 1 i n: P = {C1, C2, C3, C4, ..., Cn}
In this regard, besides the configuration management at software product level, it is also necessary to control the configuration at component level. Based on this, it was decided that every component would be also composed by a ConFiguration (CF) to organize its content and its relationships with other components.
In this case, composition and dependency relationships, as defined in [12] and [3], allow both the definition of a software product structure and the definition of relationships between its components. However, these relationships do not allow expressing intercomponent reuses. The composition relation has been applied only to construct the product from its components, by ignoring component internal units. Thus, in order to obtain a finer granularity it was considered a weaker type of composition relationship, named aggregation RAG:
Figure 2 shows the graph representing the product space from the example shown in Figure 1.
RAG ⊆ P × P = {(ca, cb) | ca, cb ∈ P Һ ca cb},
34
variant v1, represented by version 1.2, and which baseline is the revision r1, indicated by version 1.1, initiates a secondary ramification in the C1 development, without generating a delta in its main branch. It represents duplication (or ramification) of the version r1 in the P1’s product space, registered as a change in P1’s logical model. This causes a new version to P1, where the delta incorporates v1 into the P1’s configuration, i.e., v1 refers the addition of a new P1’s component, which the initial baseline is r1. At the product level, by creating a branch to some P1’s revision r2 involves the incorporation of a new product P2 within the products universe, which its first version should contain all the components contained in version r2. In this case, no change will be made in the P1’s configuration.
Figure 2. A product space example.
Regarding the semantics of the aggregation and composition relationship, we consider that a composite component i consists of a subgraph from the product graph, formed by all items transitively accessible from i, through composition and aggregation relationships.
Changes may be propagated between branches through merging. The proposed representation in Figure 3 supports these operations by means of 3-ways merge tools [12]. In this case, the variant’s baseline may be considered in the comparison between alternative versions.
B. Version Space We may classify the GALO method versioning model according to the taxonomy presented in [12] as extensional and based on changes method. In this case, versions will be explicitly generated by users, and described in terms of changes checked over the previous version.
To facilitate branches identification and retrieval of baselines or project releases some labels (i.e., tags) can be attached to variants or revisions, such as literals or URIs. This can be made for both component and product level. C. Product and Version Space Integration An AND/OR graph will be adopted for integration between product and version spaces, based on the approach of the "First Version" [12], wherein a product version selection determines its respective components versions. In the proposed graph, AND nodes are used to represent only composition and aggregation relationships, required for the generation of a complete product or component.
Implicitly, we assume the existence of a version 0.0, named as initial baseline, empty for all new Configuration Items (CI). This version serves as a baseline for the first version (1.0). Moreover, we adopt deltas directed for representing the differences between two subsequent versions. Therefore, a specific version is obtained by applying a sequence of operations changes (stored as deltas) to the initial baseline. In GALO method, the version space is represented as an acyclic and two-dimensional version graph [12]. Figure 3 illustrates the version graph of some component, containing variant and revision version types, where successive versions are connected by a successor relationship.
Figure 4 illustrates a proposed AND/OR graph, including the three versions of the software product P1, with the version 1.1 discussed in the example shown in Figure 1.
In Figure 3, it is emphasized the CI version 1.0, succeeding the empty baseline. The delta between these versions refers to the CI’s initial content, which must be added to the initial baseline to provide the first version construction.
Figure 4. The AND/OR proposed graph.
The change that generated the P1’s version 1.0 has involved the addition of versions 1.0 from components C1, C2, and C3 into version control. The second change in the product led to changes in C1, generating version 1.1, with the addition of two relationships: the C2’s aggregation and the C3’s dependency. Finally, the change, which has originated P1’s version 1.2 has covered changes in the C2’s content, which in this configuration has the version 1.1.
Figure 3. A version graph example.
Variants provide the creation of development secondary branches, where their baselines refer to original revisions. The proposed model for version space provides support for the branches creation at both component and product level. At the component level, considering a C1 component of some product P1, as illustrated in Figure 3, the creation of the
35
addition, specific labels may be added to facilitate the versions identification and recovery, based on the Baseline class.
Considering the versioning granularity from the user’s point of view, the GALO method considers the approach "Product Versioning" [12]. Products will be organized into a global and uniform space, encapsulating versions and relationships between components, as highlighted in Figure 4.
A revision has a predicate configuration, which its value can assume the ProductConfiguration or ComponentConfiguration type. The product’s configuration model contains the predicate aggregationDelta, which its value indicates the changes in components attached to the product; and may also contain the predicate associationDelta, involving changes in dependency relations with other products. Regarding the component’s model, besides the above predicates, it can also contain the predicate compositionDelta, expressing differences caused due to changes in its content.
This example has shown what elements will be recovered from the selection of the P1’s version 1.2. It should be noted that, although C2 element is found in version 1.1 in this configuration, the C2’s version 1.0 is also contained in configuration, since its content was added to C1, as part of its configuration. Thus, as noted earlier, in addition to providing a finer granularity and improving the CIs reuse, the GALO method can also provide a better changes impact control for the CIs. As emphasized by Shahri et al. [6], the RDF and OWL languages, along with some reasoned, can yield inference services, such as consistency checking, concepts satisfiability, classification, and realization.
For artifacts in RDF, the composition deltas include two triples sets, named Dataset. They can be applied to RDF graphs, considering the semantics of the set theory operations, where triples t will be added (Add(t)) or removed (Del(t)) from the original graph G1 for the construction of its posterior version G2 [13, 14]:
Δt(G1 → G2) = {Add(t) | t ∈ G2 − G1} ∪ {Del(t) | t ∈ G1 − G2}
Considering the illustrated case in Figure 4, the proposed representation allows automatically inferring, for example, the C2’s version 1.0 is also a configuration part of the P1’s version 1.2. And considering the C1’s version 1.1, the successor relationship can be used to automatically detect the existence of a new version for C2.
Regarding aggregation and dependency deltas, they need to be expressed as version addition and removal operations, expressed as change in the components’ configuration. They can be defined similarly to the composition delta, considering however, versions instead of triples:
For the representation of the proposed GALO method versioning model, we have defined a new ontology, covering the needed key concepts for an extensional SCM based on changes. This was described using the ontology language OWL-DL, briefly presented in Figure 5.
Δv(CF1→CF2)={Add(v|v ∈ CF2−CF1}∪{Del(v)|v ∈ CF1−CF2}, In this case, CF1 and CF2 refer to the first and second configurations of a component C1 ∈ P; and v corresponds to a version of a component C2 ∈ P, such that C1 C2. The delta Δv indicates what other components’ versions were removed from the original version and incorporated into the next component’s version.
Within the ontology of the Figure 5, one Product is created by a User and described by various properties, such as: label, description, creation date, and others. It also has a predicate called changes, which its value indicates the first of a changes sequence in this product. Each Change, in its turn, has a predicate logicalChange, which its value refers to the respective product version, identified by vid property, corresponding to a set of changes to individual components.
Thus, for the new proposed ontology shown in Figure 5, the class TriplesDelta refers to the delta type expressed by Δt, while the class VersionsDelta represents deltas expressed as Δv.
A Version has a predicate successor, indicating the next version of the product. Thus, one can construct a particular version of product from the application of successive changes, recorded up to this specific point at its version graph.
In addition to the constraints shown in Figure 5, some axioms are used to express complex restrictions and also to allow new knowledge inferences. We used first-order logic as a formalism to express these axioms. Following, we briefly describe some of the new proposed ontology axioms.
Besides to correspond a change in product, a version may also represent a state in individual version graphs of components. In this case, the version predicate referTo indicates the element represented by the version, i.e., a change or a component.
Given a new predicate called subComponentAdded, for a particular revision r1 of some component C2, a revision r2 of other component C1 will be added as subcomponent of r1, if and only if this relationship was added to the r1’s configuration or to any one of its predecessors:
Similarly, as product, a component also has a label and a description. In order to control its condition, it has predicate, called versions, which points to its version 1.0. Their next versions are accessed via the successor predicate of the Version class.
(∀ r1, c) configuration(r1, c) Һ (∃ d, r2) aggregatedDelta(c, d) Һ versionAdded(d, r2) ↔ subComponentAdded(r1, r2) (∀ r1, r2) subComponentAdded(r1, r2) Һ (∃ r3) successor(r1, r3)
→ subComponentAdded(r3, r2)
A version can be a Variant or Revision, where the original revisions of a variant are indicated by its predicate baseline. In
36
Figure 5. The new ontology representing the concepts involved in the GALO method.
A C1’s revision r2 was removed as subcomponent of C2’s version r1, if and only if this relationship was removed from the r1’s configuration or from any one of its predecessors.
(∀ r1) (∃ r2) subComponentAdded(r1, r2) Һ ¬ subComponentRemoved (r1, r2) ↔ subComponent(r1, r2)
(∀ r1, c) configuration(r1, c) Һ (∃ d, r2) aggregatedDelta(c, d) Һ versionRemoved(d, r2) ↔ subComponentRemoved(r1, r2)
The predicates subComponentAdded, subComponentRemoved, and subComponent express relationships irreflexive, asymmetric, and transitive. Similarly, it is also possible to define the following predicate to represent the dependency between artifacts: dependencyAdded, dependencyRemoved, and dependency.
(∀ r1, r2) subComponentRemoved(r1, r2) Һ (∃ r3) successor(r1, r3) → subComponentRemoved(r3, r2)
Once defined the predicates dependency and subComponent, now becomes possible to provide a more suitable control for changes propagation between artifacts versions through identification of versions that can be upgraded due to changes made on related artifacts. Thus, a revision r1 is affected by another revision r2 if and only if r2 is an r1’s dependency or subcomponent:
An aggregation can only be removed from a configuration if it has been previously added: (∀ r1) subComponentRemoved(r1, r2) → (∃ r2) subComponentAdded(r1, r2) Based on the predicates and subComponentAdded and subComponentRemoved, one can then infer the predicate subComponent, expressing the whole-part relationship between components:
(∀ r1, r2) dependency(r1, r2) ∨ subComponent(r1, r2) affectedBy(r1, r2)
A C1’s revision r1 has a subcomponent r2 of another component C2, if and only if this relationship was added sometime in r1’s configuration (including their predecessors) not being removed until the point indicated by r1 in the version graph of C1.
↔
With the goal of keeping the documentation always up-todate, the next axiom allows to detect suspicious relationships between versions, defining a new predicate, named suspectRelationship, which allows to check when a component has an aggregation or dependency relationship with an old
37
version of another component. A revision r1 has a suspect relationship with r2 if and only if r1 is affected by r2, which has a successor revision r3: (∀ r1, r2) affectedBy(r1, r2) Һ (∃ r3) successor(r2, r3) suspectRelationship(r1, r2) IV.
↔
ANALYSIS AND DISCUSSION OF RESULTS
Figure 8. An example of the subComponent relationship.
In order to validate the GALO method, we elaborated a case study involving the construction of a prototype system for artifacts configuration control of a Software Engineering project. This prototype was created with the aid of the tool TopBraid Composer [15], which consists of an environment for modeling and development of Semantic Web applications.
In the example of Figure 8, there is reification for a relationship between the requirements artifact and vision document, related through a predicate visionDocument. Thus, this relationship represents an aggregation, indicated by the value gcs:subComponent for the predicate gcs:type, where the requirements artifact has as subcomponent the version 1.3 of the vision document (identified by the predicate gcs:vid).
The case study has involved the version control of the following eight software artifacts: stakeholder’s needs, requirements, vision document, document of requests from main stakeholders, glossary, requirements specification, document of supplementary specifications, and requirements artifact. Figure 6 shows these artifacts. Some of them are composed of other artifacts.
From the new GALO method ontology and related axioms, it became possible to infer new information about the CIs under version control, as exemplified in Figure 9. In it, the Revision_30 refers to version 1.3 of the document of requests from main stakeholders, affected by changes from other components versions. The relationship with Revision_4 refers to a dependency on the Glossary’s version 1.0. Other revisions represent, in ascending order, the aggregations of versions 1.0, 1.0, 1.1, and 1.0, of four requirements contained in this document, for this case study.
Figure 6. Case study artifacts.
To indicate which artifacts will be submitted to version control, we used a class called Item, as shown in Figure 7.
Figure 9. An example of artifacts traceability.
Finally, Figure 10 shows the relationship between suspicious versions of some artifacts. These relationships provide a mechanism to control the changes impact on artifacts. Therefore, it becomes possible to evaluate what artifacts may change due to changes in related artifacts. Thus, the GALO method allows both a finer granularity, seeking to provide a more appropriate reuse between artifacts, and a greater control on the changes propagation to artifacts, providing a means to support traceability between software artifacts.
Figure 7. Configuration items of the case study.
Moreover, to indicate what relationships between artifacts correspond to supported aggregation and dependency relations, we have used the reification mechanism of RDF language, which allows making a statement about any other statement, as illustrated in Figure 8.
38
[7]
[8]
[9]
[10]
[11] [12]
Figure 10. Suspect relationships between artifacts.
[13]
V. CONCLUSION This work aimed to propose a new method for the Software Configuration Management (SCM), based on Semantic Web technologies, providing a more efficient knowledge management in the SCM, the software components reuse, and the changes traceability in artifacts versions.
[14]
[15]
In order to provide a greater artifacts reuse, it was defined a new relationship type named aggregation, which allows attach components versions to form composite components. Moreover, the new proposed GALO method versioning model was created and represented by means of an ontology to provide better organization and understanding of the involved concepts. The adoption of Semantic Web technology has enabled validations, inferences, and also new knowledge discoveries. Based on the proposed model, it was possible to discover all components that may be affected by changes in certain component. It was also possible to identify which components need to be checked (perhaps modified) due to changes in related components. REFERENCES [1]
[2]
[3]
[4]
[5]
[6]
HOEK, A. van der; HEIMBIGNER, D.; WOLF, A. L. Does con_guration management research have a future? In: ESTUBLIER, J. (Ed.). . [S.l.]: Springer, 1995. ZELLER, A.; SNELTING, G. Unified versioning through feature logic. ACM Trans. Softw. Eng. Methodol., ACM, New York, NY, USA, v. 6, n. 4, p. 398-441, out. 1997. ISSN 1049-331X. KITCHAROENSAKKUL, S.; WUWONGSE, V. Towards a unified version model using the resource description framework (rdf). International Journal of Software Engineering and Knowledge Engineering, v. 11, n. 6, p. 675-701, 2001. AKAMA, K.; SHIMITSU, T.; MIYAMOTO, E. Solving problems by equivalent transformation of logic programs. Journal of the Japanese Society of Articial Intelligence, v. 13, n. 6, p. 944-952, 1998. ANUTARIYA, C. et al. Towards computation with rdf elements. In: Internat. Symposium on Digital Libraries. Tsukaba, Japan: [s.n.], 1999. p. 112-119. SHAHRI, H. H.; HENDLER, J.; PORTER, A. Software configuration management using ontologies. In: Proceedings of the 3rd International Workshop on Semantic Web Enabled Software Engineering at the 4th European Semantic Web Conference (ESWC07). Innsbruck, Austria: [s.n.], 2007.
39
GUERRIERI, E. Software document reuse with xml. In: Proceedings of the 5th International Conference on Software Reuse. Washington, DC, USA: IEEE Computer Society, 1998. (ICSR 98), p. 246. AMBROSIO, A. P. et al. Software engineering documentation: An ontology-based approach. In: WebMedia/LA-WEB. [S.l.]: IEEE Computer Society, 2004. p. 38-40. ISBN 0-7695-2237-8. NGUYEN, T. N. et al. Multi-level configuration management with negrained logical units. In: EUROMICRO-SEAA. [S.l.]: IEEE Computer Society, 2005. p. 248-257. ISBN 0-7695-2431-1. ARANTES, L. d. O.; FALBO, R. d. A. An infrastructure for managing semantic documents. In: EDOCW. [S.l.]: IEEE Computer Society, 2010. p. 235{244. ISBN 978-0-7695-4164-8. KITCHAROENSAKKUL, S.; WUWONGSE, V. Software composing based on a unified scm system. In: SEKE. [S.l.: s.n.], 2001. p. 321-325. CONRADI, R.; WESTFECHTEL, B. Version models for software configuration management. ACM Comput. Surv., ACM, New York, NY, USA, v. 30, n. 2, p. 232-282, jun. 1998. ISSN 0360-0300. ZEGINIS, D.; TZITZIKAS, Y.; CHRISTOPHIDES, V. On the foundations of computing deltas between rdf models. In: Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference. Berlin, Heidelberg: Springer-Verlag, 2007. (ISWC'07/ASWC'07), p. 637-651. ZEGINIS, D.; TZITZIKAS, Y.; CHRISTOPHIDES, V. On computing deltas of rdf/s knowledge bases. ACM Trans. Web, ACM, New York, NY, USA, v. 5, n. 3, p. 14:1-14:36, jul. 2011. ISSN 1559-1131. TOPQUADRANT. TopBraid Composer. 2013. Disponível em: . Acesso em: 15/10/2013.