Feb 17, 1995 - Austin, Texas 78712. Abstract ... University of Texas at Austin ... For generators to be viable tools of future software development envi- ronments ...
Dept. Computer Sciences University of Texas at Austin Technical Report 95-03
Validating Component Compositions in Software System Generators Don Batory and Bart J. Geraci Department of Computer Sciences The University of Texas Austin, Texas 78712 Abstract GenVoca generators synthesize software systems by composing components from reuse libraries. Although GenVoca components can be composed in a vast number of ways, not all compositions are correct. In this paper, we present a model for validating component compositions. The model is based on attribute grammars and provides a powerful debugging capability of explanation-based error reporting. We demonstrate our results with examples from a GenVoca generator for container data structures. Keywords: software architectures, software system generators, attribute grammars, domain models, GenVoca, software components, explanation-based error reporting.
1 Introduction Software component technologies will play an important role in future software development. Examples of today’s componentry include Unix file filters and Visual Basic custom controls (VBXes) [Ude94]. Support for componentry in distributed environments is under development: the Object Management Group’s CORBA (Common Object Request Broker Architecture), Microsoft’s COM (Common Object Model), and Sun’s DOE (Distributed Objects Everywhere) are examples. Collectively, these technologies are aimed at hand-crafted applications: programmers are responsible for both selecting components from a library and manually-interconnecting (and modifying) them so that they work together correctly. There are at least three areas of research that have the potential of greatly accelerating the use and importance of componentry; all deal with the subject of software architectures - i.e., building systems from prefabricated components. Micro-architectures, also called design patterns, involves the cataloging of expertapproved solutions to common object-oriented design problems. Patterns define relationships that should exist among components so that the resulting software exhibits certain desirable properties (e.g., extensibility) [Gam94]. Macro-architectures, as exemplified by the work of Garlan and Shaw [Gar92], is the clarification and formalization of the role of architectural styles in the selection, composition, and interconnection of components that define systems. Among other topics, the development of domain-independent tools for interconnecting components to build systems is a goal of this research [Gar94]. Domain-specific software system generators is the third emerging technology. Generators convert highlevel specifications of systems into actual source code. Generators will target domains where customized software is common, expensive to build, and often performance-driven. There are many different models of software generation and relevant base technologies that are being explored. ARPA’s DSSA (DomainSpecific Software Architecture) program has focused on generators for command-and-control systems and avionics [Gra92]; the National Institute of Standard’s (NIST) recent program on software componentry has a broader scope. Existing results on automatic programming (e.g., AP5 [Coh89]) and designing reusable software (e.g., Synthesis [Wei90] and RESOLVE [Sit94]) are relevant. So too is research that explores the formal foundations of componentry, parameterization, and library interconnection languages ([Gog87, Sit94, Mor94]).
1
2/17/95
In this paper, we examine a particular model of software system generation called GenVoca [Bat92b]. The concepts that underlie GenVoca have been distilled from independently conceived generators for different domains: Genesis (a generator of database management systems [Bat88]), Avoca/x-kernel (communication software [Hut91]), Ficus (file systems [Hei93]), P2 (container data structures [Bat93-94]), and ADAGE (real-time avionics [Cog93]). Common characteristics of GenVoca generators are that the performance of generated systems often match or exceed that of hand-written code and that substantial productivity gains (e.g., factors of 3-4 over hand-written code) are delivered through the reuse of components. There are many open problems associated with GenVoca generators. One of the first problems encountered by users is that it is very easy to specify syntactically correct combinations of components that are semantically incorrect. As an example, there are many compositions of P2 components that will cause incorrect data structure code to be generated. For generators to be viable tools of future software development environments, it is absolutely essential that component compositions be automatically validated. That is, generators should automatically detect composition errors - and possibly even suggest repairs for these errors - rather than burdening users with the virtually impossible task of debugging generated code. In this paper, we present an engineering solution to this important problem, which we call design rule checking, that we have developed and used over the last five years in the Genesis (databases) and P2 (data structure) generators. We show that GenVoca models of software domains are grammars, where sentences correspond to component compositions. By encoding compositional constraints as inherited and synthesized attributes, we show that attribute grammars provide a natural formulation of the legal sentences (component compositions, software systems) of a domain. We illustrate our results with examples from the P2 data structure generator.
2 The GenVoca Model GenVoca is a domain-independent model for defining scalable families of hierarchical systems from components. The premise of GenVoca is that fundamental programming abstractions underlie all mature software domains. By standardizing these abstractions and their implementations, one can define software “building blocks”. Although the number of fundamental programming abstractions in a domain is rather small, there is a huge number of possible implementations. The GenVoca approach also advocates a layered decomposition of implementations, where each layer or component encapsulates a primitive domain “feature”. The advantage of GenVoca models is scalability [Bat92-93, Big94]: component libraries are relatively small and grow at the rate new components are entered; however, the number of possible combinations of components (i.e., distinct software systems in the domain that can be defined) grows astronomically. The distinguishing features of GenVoca are realms (or libraries) of plug-compatible components, type equations, and symmetric components. Components and Realms. A hierarchical software system is defined by a series of progressively more
abstract virtual machines. A component or layer is an implementation of a virtual machine. The set of components that implement the same virtual machine is called a realm. A realm is, in effect, a library of plugcompatible and interchangeable components. Generally, the membership of a realm can be enumerated. Consider realms S and T: S = { a, b, c } T = { d[ S ], e[ S ], f[ S ] }
2
2/17/95
The above notation means that realm S has three members, namely components a, b, and c, that are different implementations of the interface (i.e., virtual machine) of S. Realm T also has three members, each representing a distinct and alternative implementation of the interface of T. Parameters and Transformations. Components may be parameterized. All components of realm T, for example, have a single parameter of realm S.1 Parameters of a component indicate the realm interfaces that are imported by a component. Each component of T, such as d, exports the virtual machine interface of T and imports virtual machine interface of S. Component d encapsulates a complex transformation that maps objects and operations of export interface T to the objects and operations of import interface S. The key idea is that d’s transformation generally does not depend on how the interface of S is implemented, thus enabling any component of S to instantiate its parameter.
A critical aspect of GenVoca components is that they are all designed to work cooperatively without violating their encapsulations. For example, all components of T may get much of their information via calls to the virtual machine of S, but inherent in component designs are other common knowledge about global variables, etc., and the side effects that their updates might trigger. Much of this information can be made explicit as component parameters; however the key point is that components — whether or not they have parameters — meet system constraints and can exploit knowledge that is known system-wide. Systems and Type Equations. A software system by modeled as a type equation. Consider the following
two systems: System_1 = d[ b ]; System_2 = f[ a ]; System_1 is a composition of component d with b; System_2 composes f with a. Note that both systems are equations of type T. This means that both systems implement the same virtual machine and hence, System_1 and System_2 are interchangeable implementations of the interface of T.2 Grammars, Families of Systems, and Scalability. Realms and their components define a grammar
whose sentences (component compositions) are software systems. In Figure 1, we enumerate the memberships of realms S, T, and W, and to the right we show the corresponding grammar: S = { a, b, c
}
T = { d[S], e[S], f[S] W = {
}
n[W], m[W], p, q[T,S]
}
S :=
a
|
b
|
c
T :=
d S
| e S | f S ;
W :=
n W
| m W |
p
;
| q T S ;
Figure 1. Realms, Components, and Grammars
Just as the set of all sentences defines a language, the set of all component compositions defines a Parnas family of systems [Par76]. Adding a new component to a realm is akin to adding a new rule to a grammar; the family of systems automatically enlarges. Because large families of systems can be built using relatively few components, GenVoca is a scalable model of software construction.
1. Parameterizations that we examine in this paper are simple enough to dispense with formal parameter names. 2. Note that composing components can be interpreted as stacking layers in hierarchical software systems. We use the terms component and layer interchangeably, although our use of the term ‘layer’ is different from its typical usage. 3
2/17/95
Symmetry. Just as recursion is fundamental to grammars, recursion in the form of symmetric components
is fundamental to GenVoca. More specifically, a component is symmetric if it exports the same interface that it imports (i.e., a symmetric component of realm W has at least one parameter of type W). Symmetric components have the unusual property that they can be composed in almost arbitrary ways. In realm W of Figure 1, components n and m are symmetric whereas p and q are not. This means that compositions n[m[p]], m[n[p]], n[n[p]], and m[m[p]] are possible, the latter two showing that a component can be composed with itself. In general, the order in which components are composed can significantly affect the semantics, performance, and behavior of the resulting system. Design Rules and Domain Models. In principle, any component of realm S can instantiate the parameter of any component of realm T. The resulting equations would be type correct. Although an equation may be
type correct, there are always certain combinations of components that are semantically incorrect. Additional domain-specific constraints called design rules are needed to preclude incorrect component combinations. Design rule checking (DRC) is the process of applying design rules to validate type equations. An architecture reference model (or domain model) for a GenVoca generator consists of realms of components and rules that govern component composition. In the next section, we briefly review the realms of the P2 generator and illustrate some of its design rules.
3 The P2 Generator and Design Rules P2 is a GenVoca generator for container data structures [Bat93-94]. The domain model of P2 relies on two realms: ds and mem. ds components export a standardized container-cursor interface. Among the components of ds are those that implement binary trees, doubly-linked ordered and unordered lists, free lists of previously deleted elements, and sequential arrays. mem components export standardized memory allocation and deallocation operations. Among its members are components that manage space in persistent and transient memory. ds
= { bintree[ ds ], dlist[ ds ], odlist[ ds ], avail[ ds ], mlist[ ds, ds ], malloc[ mem ], array[ mem ], inbetween[ ds ], top2ds[ ds ], ... }
mem = { transient, persistent, ... }
// // // // // // // // //
binary tree unordered doubly-linked list key-ordered doubly-linked list free list of deleted elements multilist indexing heap storage sequential storage has deletion actions for some components the topmost layer of a ds expression
// transient memory allocation // memory mapped persistence
Currently there are over fifty components in P2, many of which are symmetric. Container data structures are defined by type equations that typically reference from five to twenty components. Such equations are much too elaborate to validate by inspection; even the correctness of simple equations is not obvious. Validation is further complicated by the fact that many components have rather obscure rules for their use. For example, the inbetween component encapsulates algorithms that are shared by many data structure components (e.g., bintree and dlist). These algorithms deal with the positioning of a cursor immediately after an element has been deleted (e.g., does the cursor point to a “hole” or should it be positioned on the next element in the container?). These details are hidden from users, but are needed for generating correct data structure algorithms. 4
2/17/95
The valid usage of inbetween requires that a single copy of inbetween be present in a type equation that uses at least one data structure component (dlist, bintree, etc.) and it should precede all such components in the equation. The right equation, below, shows a correct usage of inbetween - i.e., it precedes all data structure components. The wrong equation, below, shows an incorrect usage: a data structure component dlist appears prior to inbetween. right = wrong =
… inbetween[ … [ dlist[ dlist […] ] ] ] ] ; ... dlist[ … [ inbetween[ dlist[ … ] ] ] ] ;
Rules such as this should not be borne by programmers; they are much too easy to forget and to be misapplied. A design rule checker that tests such rules automatically and reports errors when they occur removes a tremendous burden from P2 users. We first present a general model of design rule checking in Section 4 and then show how we adapted this model to P2 and Genesis in Sections 5 and 6.
4 A Model of Design Rule Checking Layers v.s. Transformations. GenVoca components have dual interpretations. One way is to interpret them as layers of a software system. Consider type equation S = A[B[C]]. Figure 2a depicts S where layer A sits atop B and B sits atop C. Alternatively, GenVoca components can be interpreted as program transforma-
tions that encapsulate mappings between export and import interfaces. A type equation specifies the sequence of transformations that is to be applied to an abstract program to convert it into a concrete program. Figure 2b depicts the transformation of abstract program P into a concrete program PABC by S: program P is mapped to program PA by transformation A, program PA is mapped to PAB by transformation B, and program PAB is mapped to PABC by transformation C.3 S
abstract program
P A
A
PA (a)
B
(b)
B PAB C
C
PABC
concrete program
Figure 2. Type Equations as Program Transformations
The Genesis generator is best described by Figure 2a, where database management systems are formed by compositions of layers. The P2 generator, in contrast, more closely matches Figure 2b. P2 takes a C program as input. This program is abstract and not executable because it references P2 data types and these data types have no implementation. After the P2 generator applies the transformations of S, the same C program is produced, but now an implementation for all P2 data types has been generated. The resulting concrete program is executable. Attributes and States of Program Development. Design rule checking examines states of program (or type
equation) development; it does not model states of program executions. This distinction is illustrated in Figure 3. A component transforms program P into program P’. Associated with every program is a set of attributes that defines its states or properties. Thus, we might define an attribute state whose initial value is no-loops (meaning that program P has no loops), and after applying a transformation, state has the value
3. Note that a type equation is not evaluated functionally; that is, the outermost transformation is applied first, not the innermost. The intuition behind this evaluation order stems from a layered interpretation: as one descends into system S, the outermost layers will be encountered first and the innermost layers last. 5
2/17/95
has-loops (meaning that program P’ has loops). Design rule checking deals with the testing and assignment of program design states; it assumes that all transformations (components) are correct. transformation program P
program P’ state = has-loops
state = no-loops
Figure 3. Modeling States of Program Development Design Rules. We have encountered two different kinds of design rules in the development of Genesis and P2. The first deals with preconditions for component usage. Suppose component K has a precondition that attribute A must have the value v (see Figure 4a). For K to be correctly used, there must be some component, say X, that sits above K whose postcondition sets A = v. Note that X need not be immediately above K; X might reside far above K in a type equation. In general, we have found preconditions for component
usage to be conjunctions of simple predicates. The second kind of design rule deals with preconditions for parameter instantiation. Suppose component K has a single parameter with the precondition that attribute A must have the value v (see Figure 4b). For the parameter to be correctly instantiated, there must be some component, say X, that lies below K whose postcondition sets A = v. Analogously, X need not be immediately beneath K; X might reside far below K in a type equation. Once again, we have found these preconditions to be conjunctions of simple predicates. The preconditions and postconditions in this paragraph (and Figure 4b) are different than those in the previous paragraph (and Figure 4a). To distinguish them, preconditions for parameter instantiation will be called prerestrictions and the corresponding postconditions are postrestrictions. Every component in our model has preconditions, postconditions, prerestrictions, and postrestrictions (see Figure 4c). How are they different? Postconditions for a component K declare the attribute values that K will export to layers beneath K in a type equation. Postrestrictions for K declare the attribute values that K will export to layers above K in a type equation. Preconditions test the cumulative postconditions of layers that lie above K, while parameter prerestrictions test the cumulative postrestrictions of components that lie beneath K. Thus, the concepts are similar, but distinct. X
K post: A = v
pre: A = v
pre: A = v
post: A = v
preconditions
postrestrictions K
K
X
(a)
(b)
postconditions
prerestrictions (c)
Figure 4. Different Kinds of Design Rules
Given the preconditions, postconditions, prerestrictions, and postrestrictions of every component of a type equation, design rule checking involves: •
a top-down propagation of postconditions and the testing of component preconditions, and
•
a bottom-up propagation of postrestrictions and the testing of parameter prerestrictions.
In the following sections, we define algorithms for top-down and bottom-up design rule checking.
6
2/17/95
4.1 Top-Down Design Rule Checking Figure 5a shows three components with their preconditions and postconditions. When these components are composed, they define a transformation that maps abstract programs to concrete programs. The initial set of attribute values that describes an abstract program is denoted by top. The set of conditions that hold after transformation A has been applied is top-A, which is computed by applying the postconditions of A to top (i.e., top-A = postcondition-A ⊕ top). The left-associative ⊕ operator is the postcondition propagation operation. Similarly, the set of conditions that hold after transformation B has been applied is top-AB = postcondition-B ⊕ top-A, and so on. Figure 5b depicts this propagation of postconditions. As attribute states are propagated downwards, component preconditions can be tested. For example, transformation A can be applied only if top implies its preconditions (top ⇒ precondition-A). Similarly, transformation B can be applied only if top-A implies B’s preconditions (top-A ⇒ precondition-B), and so on (see Figure 5c). When type equations correspond to a linear stack of components, there is nothing particularly novel about top-down design rule checking: only two operators ⊕ and ⇒ are needed. top ⇒
top = initial attribute states of abstract program precondition-A A
top-A ⇒
= top-A
postcondition-B ⊕ top-A
= top-AB
precondition-B B
top-AB ⇒
postcondition-A ⊕ top
precondition-C C
postcondition-C ⊕ top-AB = top-ABC
(a) preconditions and postconditions (c) precondition testing
(b) postcondition propagation
Figure 5. Top-Down Design Rule Checking
In general, type equations correspond to trees of components. Branching arises when components have multiple parameters. Consider component A[x,y] which has two parameters x and y. GenVoca components are designed to have the (possibly unusual) property that their parameters can be independently instantiated; that is, how one instantiates parameter x does not impact how one must instantiate parameter y, and vice versa. Another (possibly unusual) property is that each parameter of a component has its own postcondition. Thus, component A would have two postconditions: one for x and another for y. Each postcondition defines the set of attribute values that hold for that particular parameter. We have observed that, in general, postconditions of different parameters will not be the same. As an example, the realm (type) of a parameter could be expressed as a postcondition. Thus, if the realms for parameters x and y are different, so too would be their postconditions. Figure 6 depicts the general situation. Component A has a postcondition for x (denoted postcondition-x) and a postcondition for y (denoted postcondition-y). If top is the set of conditions that hold prior to transformation A, top-x is the set of conditions that hold for parameter x after A has been applied, and top-y is
7
2/17/95
the set of conditions that hold for parameter y. top-x is computed by applying x’s postcondition to top (top-x = postcondition-x ⊕ top) and top-y is computed similarly (top-y = postcondition-y ⊕ top). top legend A
component with two parameters postcondition-x propagation of conditions
x
top-x = postcondition-x ⊕ top
y
postcondition-y
top-y = postcondition-y ⊕ top
Figure 6. Propagation of Postconditions for Components with Multiple Parameters
There is a simple, recursive algorithm for the top-down propagation of postconditions and the testing of component preconditions. In Table 1, we list notations for referencing a component, its children, its preconditions, and parameter postconditions. Our algorithm is given in Figure 7. Notation
Meaning
k.parent
the parent component of k
k.childi
the component that instantiates parameter i of k
k.pre
the precondition of k
k.posti
the postcondition of parameter i of k
k.preresi
the prerestriction of parameter i of k
k.postres
the postrestriction of k
k.cpost
the cumulative postcondition of k’s ancestors
k.cpostres
the cumulative postrestriction of the system rooted at k
k.∆
the postrestriction merge function of k TABLE 1. The structure of a component k
4.2 Bottom-Up Design Rule Checking Every parameter of a component has preconditions (called prerestrictions) for instantiation; every component also has postconditions (called postrestrictions) that are exported to higher-level layers in a type equation. Figure 8 depicts a typical situation: components Q, R, S, T, and W are composed hierarchically, and Q has a single parameter. In general, the prerestrictions for Q are not satisfied by the component R that instantiates its parameter, but rather by components deep within the system rooted at R. That is, the prerestrictions of Q may be satisfied by R or S or T or W, or any combination thereof. This gives rise to a different interpretation of instantiation, namely that systems instantiate parameters, not components. Every system exports a realm interface plus a set of attribute values (called system postrestrictions) that higher-level layers can reference. A component parameter has been correctly instantiated if the postrestrictions of the instantiating system imply that parameter’s prerestrictions. Figure 9a shows a linear stacking of three components with their associated prerestrictions and postrestrictions. Non-parameterized component C can be understood as a system whose postrestrictions are bot-C = postrestriction-C. The postrestrictions of the system rooted by B is bot-B, which is computed by applying
8
2/17/95
// // // //
if te is the root component of a type equation and top defines the initial attribute states for an abstract program, precondition_validation( te, top ) will return TRUE if te is free of precondition errors
boolean precondition_validation( root, root_conditions ) { root.cpost = root_conditions; if (root.cpost ⇒ root.pre) { foreach child i of root { cpost = root.posti ⊕ root.cpost; if (¬ precondition_validation( root.childi, cpost )) return FALSE ; } return TRUE ; } return FALSE; }
Figure 7. Precondition Validation Algorithm
the postrestrictions of B to the postrestrictions of the imported system rooted at C, i.e., bot-B = postrestriction-B ⊕ bot-C. Similarly, the postrestrictions of the system rooted by A is bot-A, which is computed by applying the postrestrictions of A to the postrestrictions of the imported system rooted at B (top-A = postrestriction-A ⊕ bot-B), and so on. Figure 9b depicts this propagation of postrestrictions. As attribute states are propagated upwards, parameter prerestrictions can be tested. For example, component (or system) C can instantiate B’s parameter if the postrestrictions of C imply B’s prerestrictions (i.e., bot-C ⇒ prerestriction-B). Similarly, the system rooted at B can instantiate A’s parameter if that system’s postrestrictions imply A’s prerestrictions (i.e., bot-B ⇒ prerestriction-A), and so on (see Figure 9c). If there are general constraints that a system must satisfy, these would be expressed by a goal prerestriction, where the postrestrictions of the system must imply the prerestrictions of goal. Note that in the case where type equations correspond to linear stacks of components, the same operators ⇒ and ⊕ used in top-down design rule checking are used in bottom-up design rule checking. bot-A ⇒
bot-B ⇒
Q
system rooted at R
goal = set of properties to be established postrestriction-A
⊕ bot-B
= bot-A
postrestriction-B
⊕ bot-C
= bot-B
postrestriction-C
= bot-C
A prerestriction-A
B bot-C ⇒
prerestriction-B
R
C
S
(a) prerestrictions and postrestrictions T
W
(c) prerestriction testing
Figure 8. System Instantiation of Parameters
(b) postrestriction propagation
Figure 9. Bottom-Up Design Rule Checking
9
2/17/95
When components have multiple parameters, there is the need for an additional operator ∆. Again consider component A[x,y]. Suppose system X instantiates parameter x and system Y instantiates y. The conditions that will be exported by the system A[X,Y] is determined by the postrestrictions of A applied to a merging of the conditions exported by systems X and Y. ∆ is this merging operator. That is, the postrestrictions of the system A[X,Y] equals postrestriction-A ⊕ ∆(postrestriction-X, postrestriction-Y). In general, the ∆ operator takes the postrestrictions of any number of systems as input and merges them into a single postrestriction. In principle, the ∆ operator is component specific, i.e., each component may have its own way of merging postrestrictions. Given the operators ⊕, ⇒, and ∆, there is a simple, recursive algorithm for the bottom-up propagation of postrestrictions and the testing of parameter prerestrictions. This algorithm is given in Figure 10. // // // //
if te is the root component of a type equation and goal is the prerestriction that te is to satisfy, prerestriction_validation(te,goal) returns TRUE if te has no prerestriction errors and that it satisfies goal
boolean prerestriction_validation( root, root_prerestriction ) { if (root has no children) { root.cpostres = root.postres; } else { foreach child i of root { // return false if any subtree has prerestriction errors if ( ¬ ( prerestriction_validation(root.childi, root.preresi ) ) return FALSE; } root.cpostres = root.postres ⊕ root.∆( root.child1.cpostres, root.child2.cpostres, } return root.cpostres
⇒
…)
;
root_prerestriction ;
}
Figure 10. Prerestriction Validation Algorithm
4.3 Attribute Grammars McAllester [McA94] observed that the concepts of realms, components, attributes, top-down design rule checking and bottom-up design rule checking can be unified by attribute grammars [Aho88, Der88]. We saw in Section 2 that realms of components define a grammar. In the last two sections, we used attributes to model states of program development. Component postconditions correspond to inherited attributes, i.e., attributes whose values are determined by component ancestors. Component postrestrictions correspond to synthesized attributes, i.e., attributes whose values are determined by component descendants. The practical benefit of this connection with attribute grammars, besides the fact that design rule checking reduces to a well-studied problem, is that common tools, such as lex and yacc, are well-suited for implementing design rule checkers. We explain an implementation of our algorithms in the next two sections.
10
2/17/95
5 Targeting DRC Algorithms for Specific Domains The DRC algorithms of Section 4 are domain-independent. To specialize them to a particular domain, we need definitions and representations for attributes, predicates, and the operators ⊕, ⇒, and ∆. In the following, we explain the representations for P2; the same representations are also used for Genesis.
5.1 Attributes An attribute models a property that exposes some composition constraint. The attributes that we have encountered assume restricted values: any, assert, negate, inherit, and set4. Table 2 defines the semantics of each value. Attribute Value any assert negate inherit set
Interpretation nothing is known about the property of attribute property of attribute is asserted property of attribute is negated property value is inherited from existing conditions property of attribute is both asserted and negated
TABLE 2. Attribute Values used in P2 and Genesis
Example P2 attributes are: df_present and retrieval. df_present represents the property that a component realizes logical deletions. That is, instead of physically deleting an element, the component marks the element deleted but does not immediately reclaim its space. The retrieval attribute represents the property that a component interlinks all elements of a container to facilitate searching. Components that implement data structures (e.g., bintree, dlist, etc.) have the retrieval property. The assignment of assert or negate to these attributes depends on whether a component exhibits the desired property. inherit is used when a component does not alter an attribute’s value.
5.2 Predicates Preconditions and prerestrictions request specific attribute values (e.g., any, assert, negate, set), but not how the attribute value was determined (e.g., inherit). This leads to only eight different primitive predicates that can be defined over a single attribute. Table 3 lists these predicates. In general, compound prediPredicate true (any) assert negate set not assert not negate not set false
Interpretation true (no constraints) attribute has assert value attribute has negate value attribute has the set value attribute does not equal assert attribute does not equal negate attribute does not equal set false (unsatisfiable)
TABLE 3. Primitive Predicates used in P2 and Genesis
4. set is a value that only arises during the merge of the postrestrictions of two or more systems, where one system asserts a property while another negates this same property. 11
2/17/95
cates are conjunctions of primitive predicates, one primitive predicate for each attribute. Compound predicates are encoded as a vector of primitive predicates that are indexed by attribute. Thus, if P = P1 ∧ P2 ∧ … ∧ Pn is a compound predicate, it would be encoded as the vector [P1, P2, …, Pn] where Pi is the primitive predicate for attribute i.
5.3 Postcondition Propagation Operator ⊕ Component postconditions and postrestrictions assert, negate, and declare set values, or they may propagate existing (inherited) conditions. Table 4 defines the condition propagation operator + for a single attribute. Existing Condition Postcondition/Postrestriction + Existing Condition
Postcondition or Postrestriction
true
assert
set
negate
false
assert
assert
assert
assert
assert
assert
set
set
set
set
set
set
negate
negate
negate
negate
negate
negate
inherit
true
assert
set
negate
false
TABLE 4. The Propagation Operator + for a Single Attribute
Given a postcondition/postrestriction P = [P1, P2, …, Pn] and the vector of existing conditions E = [E1, E2, …, En], the ⊕ operator corresponds to vector addition: P ⊕ E = [ P1 + E1, P2 + E2, …, Pn + En ]
5.4 Implication Operator ⇒ The implication operator
→ for a single attribute is defined by a truth-table (Table 5). Given a vector of Precondition or Prerestriction
Existing Condition → Precondition/Prerestriction
Existing Condition
true
assert
set
negate
not assert
not set
not negate
false
true
true
false
false
false
true
true
true
false
assert
true
true
false
false
false
true
true
false
set
true
false
true
false
false
false
false
false
negate
true
false
false
true
true
true
false
false
false
false
false
false
false
false
false
false
false
TABLE 5. The Implication Operator → for a Single Attribute
existing conditions E = [E1, E2, …, En] and a precondition/prerestriction vector P = [P1, P2, …, Pn], the implication operator ⇒ has a simple realization: all primitive predicates must be true for the compound predicate to be true: E
⇒
P = ( E1 → P1 )
∧ ( E2 → P2 ) ∧ ... ∧ ( En → Pn )
12
2/17/95
5.5 The Merge Operator ∆ A characteristic of the P2 domain model is that most components share the same ∆ operator. In general, the “merge” of the postrestrictions of n systems corresponds to copying of the postrestrictions of the first system and discarding the postrestrictions of the rest. That is: ∆( postrestriction1, postrestriction2, … ) = postrestriction1
The reasons for this are rather involved and peculiar to data structures and databases. Readers who are interested in a detailed justification should consult [Bat85].
6 Implementation Notes The implementation of our DRC algorithms was straightforward: the source files consist of 1500 lines of lex and yacc. We wrote a general utility, called dreck, that would allow designers to declare realms, components, and their design rules based on the representations we noted previously for attributes, predicates, and DRC operators. Figure 11 shows a dreck declaration of the array component and its design rules. A component’s name, realm membership, and realm parameters are declared on the first line. Subsequent lines define design rules. A precondition for array’s usage is that a layer above array needs to support logical deletion. This precondition is expressed by asserting the df_present property. Another design rule is to assert to layers above and below that array is a retrieval layer. Such a declaration is expressed by asserting the retrieval property as a postcondition and postrestriction. name of component realm of component realm parameters of component design rules X postcondition: A = negate
array : ds [ mem ] { # logical deletion layer required above array precondition assert df_present Z # assert that array is a retrieval layer # to all descendants and ancestors
postcondition: A = assert
postcondition assert retrieval postrestriction assert retrieval
precondition: A = assert
Y
} Figure 11. Specification of Design Rules
Figure 12. Locating and Fixing Errors
Algorithm Efficiency. Our DRC algorithms are efficient. Their complexity is O(mn), where m is the number of attributes and n is the number of components in a type equation. To give readers upper estimates of m and n, the most complicated type equations that we have encountered in Genesis and P2 have approximately 30 components (i.e., n ≤ 30). Genesis maintains the greatest number of attributes (m=14), whereas P2 has fewer (m=8), even though both generators maintain a library of 50 components. Although it is not difficult to envision greater values for m and n, substantially greater values (e.g., m, n > 100) seem unlikely. Explanation-Based Error Reporting. Detecting precondition and prerestriction errors is half the problem of
debugging type equations; reparations is the other half. Correcting a type equation requires two sources of information: one needs to know the component whose preconditions or prerestrictions failed, and also the
13
2/17/95
component(s) whose postconditions or postrestrictions caused the failure. We have found simple modifications of the + operator of Table 4 enables the relevant information to be collected. One technique that we use is illustrated in Figure 12. Suppose component Y’s precondition A=v failed. This means that some component above Y, say X, set A ≠ v as a postcondition. To repair this error, there needs to be another component, Z, that must be inserted below X and above Y whose postcondition is A=v. Techniques such as form the basis of a powerful explanation-based error reporting scheme. The following example illustrates the idea. Example. Suppose we would like a P2 container implementation that stores elements onto a binary tree,
whose nodes are stored sequentially in transient memory. A first attempt at a composition might be: 5 first_try = top2ds[ bintree[ array[ transient ] ] ];
Our DRC algorithms report the following: Precondition errors: an inbetween layer is expected between top2ds and bintree a logical deletion layer is expected between top2ds and array Prerestriction error: parameter 1 of top2ds expects a subsystem with a qualification layer
The first error reminds us (from Section 3) that we forgot that a bintree layer requires the inbetween layer to be above it. Not only that, the error message states exactly how to repair the equation; there is only one location where inbetween can go (i.e., in between top2ds and bintree). The second error reminds us (from Figure 11) that array requires a logical deletion layer above it. Further, this layer must be below top2ds. The third error tells us that a qualification layer is required below top2ds. Users with minimal experience with P2 are able to repair all of these errors easily. But suppose repairs lead to the following equation: second_try = top2ds[ inbetween[ bintree[ qualify[ delflag[ array[ transient ] ] ] ] ];
where qualify is a qualification layer and delflag is a logical deletion layer. The DRC response to this equation is: Precondition error: a retrieval layer is not expected above qualify; upper layer bintree caused the inconsistency
This error tells us that all retrieval layers must lie beneath qualify; the fix is to transpose bintree and qualify, which results in a correct equation: correct = top2ds[ inbetween[ qualify[ bintree[ delflag[ array[ transient ] ] ] ] ];
In general, the DRC error messages direct users to modify an incorrect equation to the nearest set of correct type equations in the space of all equations. We have found this advice works well. With minimal
5. bintree links elements of a container onto a binary tree; the nodes of the binary tree will be stored sequentially in an array; the array will reside in transient memory. The top2ds layer must root all P2 type equations; had top2ds been absent, the DRC algorithms would report additional errors. 14
2/17/95
experience, P2 users typically come very close to their desired equation on the first attempt; DRC messages enable them to correct errors quickly. Errors vs. Warnings. Errors have different levels of severity. Benign errors (such as the redundancy of
components, which merely introduces inefficiencies) should be reported to users, while fatal errors should terminate code generation. Error severity levels can be specified for any precondition/prerestriction declaration in dreck. Thus, when an error is detected, its severity level is known and the appropriate error handling routine (termination, warning) can be activated. Severity levels is yet another way to help users debug their compositions. Improvements. There are many ways in which our algorithms could be improved; we discuss three here.
First, the notation we adopted in Section 2 does not indicate that components often have many non-realm parameters. Such parameters are called configuration parameters [Cog93]: these are data types, tuning constants, performance constraints, etc. Currently, configuration parameters are checked during the compilation of P2 programs or during the compilation of the generated C programs. In general, however, we believe a unified treatment of DRC for all realm and configuration parameters is possible. Second, the representations of attributes and the DRC operators are specific to P2 and Genesis. We know that these representations are inadequate to express the DRC needs of Avoca (a generator for communication protocols) where querying and manipulating message addresses (e.g., are message addresses UDP or IP addresses?) are required. More general data types, other than enumerations, must be supported, along with more general predicates (e.g., disjunctions of conditions). Third, it may be possible to be more aggressive in repairing composition errors. For example, it seems likely that some errors can be repaired automatically (e.g., inserting inbetween into an equation). Also, it may be possible to identify the specific list of components (e.g., such as Z in Figure 12) which could be used in repairing errors.
7 Related Work and Insights Related Work. An early version of our DRC algorithms appeared in DaTE, the design rule checker for
Genesis [Bat92c]. DaTE only supported component preconditions; there were no prerestrictions. The limitations of DaTE lead to the work presented in this paper. McAllester has developed a functional programming language, VAG, based on variational attribute grammars, to address the design rule checking issues for the ADAGE generator [McA94]. Preconditions and prerestrictions are treated uniformly as constraints. The constraints associated with a component are expressed as a VAG program. When an avionics system is composed from components, the set of constraints that must be satisfied is defined by the composition of corresponding VAG programs. The VAG interpreter has limited reasoning abilities to infer values of unbound VAG program parameters. Parameterized programming is intimately associated with the verification of component compositions. Goguens’ work on OBJ [Gog83] and library interconnection languages, such as LIL and LILEANNA, [Gog87, Tra93] is seminal. The RESOLVE project explores the design of reusable and parameterized components, component certifiability, and the certifiability of component compositions [Sit94]. There are many similarities among these works and ours, such as GenVoca realms define a simple module/library interconnection language. However, there is a fundamental difference: the composition of OBJ, LILEANNA, and RESOLVE components is verified locally; components can only constrain the behavior of immediately adjacent components, and not components that reside far above or below them in a hierarchy. It is the propagation of conditions up and down a layer hierarchy that seems unique to our work.
15
2/17/95
Our work is also an example of the types of problems encountered in software architectures [Gar94, Mor94]. To our knowledge, the problems of verifying compositions of components in the context of architectures has only begun to be addressed. Insights. In general, determining whether a composition of components meets the specification of a target
system is a challenging problem in program verification. We know how difficult simple programs can be to verify; proving entire systems correct automatically is beyond today’s technology. Thus, it would seem unlikely that solutions to this problem would be automatable. However, we have found exactly the opposite in our research. The problem of design rule checking is straightforward and not only is its solution automatable, it is efficient. How can this be? There are many reasons. It may be due to the questions that we are posing or that the domains we are analyzing are mature and fairly static. Other reasons might be revealed if there were a model that defined the formal underpinnings of GenVoca. However, our best conjecture at this time centers on the observation that GenVoca is a methodology for designing reusable software components and for composing these components into systems. We know that one of the main reasons why object-oriented design methodologies is so powerful and attractive is because of their ability to manage and control software complexity [Rum91, Boo91]. Standardization of domain abstractions and programming interfaces to these abstractions is the core of GenVoca. It is not difficult to recognize that standardization is a very powerful way of managing and controlling the complexity of software of a family of systems. We have come to believe that standardization makes some problems tractable that would otherwise be very difficult. Design rule checking is a case in point; standardization seems to limit the ways in which components can constrain each other’s behavior. This, in turn, makes DRC tractable.
8 Conclusions Software system generators will become important tools for software developers. These generators will utilize libraries of reusable components to assemble complex, high-performance systems quickly and inexpensively. Each library component will have limitations, called design rules, on how it can be combined with other components. Experience has shown that validating component compositions is difficult to do by casual inspection; as the number of components and the complexity of their rules grow, a mechanical approach to validation is absolutely essential. We have presented algorithms that we have used to validate component compositions for the Genesis and P2 generators. They rely on attribute grammars as a framework to distinguish preconditions for component usage from preconditions for parameter instantiation. Our algorithms are practical: they are simple, easy to implement, and efficient. Moreover, we have shown that with additional techniques, it is possible to provide a powerful explanation-based error reporting capability to suggest how incorrect compositions can be repaired. We illustrated some of the capabilities of our algorithms with examples from the P2 (container data structure) generator. Our work raises interesting theoretical questions as to why our design rules are so simple. We currently believe that the answer lies the power of standardization to control the complexity of families of software systems. Components that are designed to be interoperable, plug-compatible, and interchangeable often makes otherwise very difficult problems tractable. Acknowledgements. We thank Chris Lengauer, Ira Baxter, Bruce Weide, and Steve Edwards for their help
in clarifying the concepts presented in this paper.
16
2/17/95
9 References [Aho88]
A.V. Aho, R. Sethi, and J.D. Ullman, Compilers: Principles, Techniques, and Tools, AddisonWesley, Reading, Massachusetts, 1988.
[Bat85]
D. S. Batory, “Modeling the Storage Architectures of Commercial Database Systems.” ACM Transactions on Database Systems, December 1985.
[Bat88]
D. S. Batory, J. R. Barnett, J. F. Garza, K. P. Smith, K. Tsukuda, B. C. Twichell, and T. E. Wise. “GENESIS: An extensible database management system”, IEEE Trans. on Software Engineering, November 1988.
[Bat92a]
D. Batory, V. Singhal, and M. Sirkin. “Implementing a Domain Model for Data Structures”, International Journal of Software Engineering and Knowledge Engineering, 2(3):375-402, September 1992.
[Bat92b]
D. Batory and S. O’Malley, “The Design and Implementation of Hierarchical Software Systems with Reusable Components”, ACM Transactions on Software Engineering and Methodology, October 1992.
[Bat92c]
D. S. Batory and J. R. Barnett. “DaTE: The Genesis DBMS Software Layout Editor.” In Conceptual Modeling, Databases, and CASE, Pericles Loucopoulos and Roberto Zicari, eds. John Wiley & Sons, New York, New York. 1992.
[Bat93]
D. Batory, V. Singhal, M. Sirkin, and J. Thomas, “Scalable Software Libraries”, Proc. ACM SIGSOFT, December 1993.
[Bat94]
D. Batory, V. Singhal, J. Thomas, S. Dasari, B. Geraci, and M. Sirkin. “The GenVoca Model of Software System Generators.” IEEE Software, September 1994.
[Boo91]
G. Booch. Object-Oriented Design With Applications, Benjamin-Cummings, 1991.
[Big94]
T. Biggerstaff. “The Library Scaling Problem and the Limits of Concrete Component Reuse”, IEEE International Conference on Software Reuse, November 1994.
[Bur77]
R. M. Burstall and J. A. Goguen. “Putting Theories Together to Make Specifications.” Fifth International Joint Conference on Artificial Intelligence (IJCAI). Cambridge, MA. 1977.
[Cog93]
L. Coglianese and R. Szymanski, “DSSA-ADAGE: An Environment for Architecture-based Avionics Development”, Proc. AGARD, 1993.
[Coh89]
D. Cohen, AP5 Training Manual, USC Information Sciences Institute, 1989.
[Der88]
P. Deransart, M. Jourdan, and B. Lorho. “Attribute Grammars: Definitions, Systems, and Bibliography.” Lecture Notes in Computer Science. Volume 323. Springer-Verlag. 1988.
[Dia87]
R. Prieto-Dìaz and J.M. Neighbors. “Module Interconnection Lagnuages.” Journal of Systems and Software. November 1986. pp. 307-334. Reprinted in [Fre87].
[Fre87]
P. Freeman, ed., Tutorial: Software Reusability, IEEE Computer Science Press, 1987.
[Gam94]
E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design Patterns: Micro-Architectures for REusable Object-Oriented Design, Addison-Wesley, 1994.
[Gar92]
D. Garlan and M. Shaw, “An Introduction to Software Architecture”, in Advances in Software Engineering and Knowledge Engineering, Vol. 1, V. Ambriloa and G. Tortora, eds., World Scientific Publishing.
[Gar94]
D. Garlan, R. Allen, and J. Ockerbloom, “Exploiting Style in Architectural Design Environments”, ACM SIGSOFT 1994.
17
2/17/95
[Gra92]
M. Graham and E. Mettala, “The Domain-Specific Software Architecture Program”, Proceedings of DARPA Software Technology Conference, 1992. Also, in Crosstalk: The Journal of Defense Software Engineering, October 1992.
[Gog83]
J.A. Goguen, “Parameterized Programming”, Workshop on Reusability in Programming. Newport, RI. September 1983.
[Gog87]
J.A. Goguen, “Reusing and Interconnecting Software Components”, Computer. February 1986, pp. 16-28. Reprinted in [Fre87].
[Gog89]
Joseph A. Goguen. “Principles of Parameterized Programming.” in Software Reusability. Ted Biggerstaff and Alan J. Perlis, eds. Volume I. Addison Wesley 1989. 159-225.
[Hei93]
J. Heidemann and G. Popek, “File System Development with Stackable Layers”, ACM Transactions on Computer Systems, March 1993.
[Hut91]
N. Hutchinson and L. Peterson, “The x-kernel: An Architecture for Implementing Network Protocols”, IEEE Transactions on Software Engineering, January 1991.
[McA94]
D. McAllester. “Variational Attribute Grammars for Computer Aided Design.” ADAGE-MIT94-01.
[Mor94]
M. Moriconi and X. Qian, “Correctness and Composition of Software Architectures”, ACM SIGSOFT 1994.
[Par76]
D.L. Parnas, “On the Design and Development of Program Families,” IEEE Transactions on Software Engineering, March 1976.
[Rum91]
J. Rumbaugh, M. Blaha, W. Premerlani, F. Eddy, and W. Lorensen, Object-Oriented Modeling and Design, Prentice Hall, 1991.
[Sit94]
M. Sitaraman and B. Weide, “Component-Based Softwaer using RESOLVE”, ACM Software Engineering Notes, October, 1994.
[Tra93]
W. Tracz, “LILEANNA: A Parameterized Programming Language,” Advances in Software Reuse : Selected Papers from the Second International Workshop on Software Reusability. Lucca, Italy. Rubén Prieto-Dìaz and William B. Frakes, eds. IEEE Computer Science Press, March 24-26, 1993.
[Ude94]
J. Udell, “Componentware”, BYTE, May 1994.
[Wei90]
D.M. Weiss, Synthesis Operational Scenarios, Technical Report 90038-N. Version 1.00.01, Software Productivity Consortium. Herndon, Virginia. August 1990.
18
2/17/95