increasing awareness that the first stages in software development are crucial for the ... Let A be the universe of event types associated with the application domain that is our ... Hence, for the object type P, the alphabet SAP is partitioned.
Assessing the size and complexity of formally specified conceptual models Geert Poels Research Assistant of the Fund for Scientific Research - Flanders Katholieke Universiteit Leuven Prof. dr. Guido Dedene Katholieke Universiteit Leuven
Abstract - A number of measures are presented for the assessment of size and complexity of formally specified models of application domains. We show that formal specifications of the software problem domain can be rigorously measured using established theories like Measure Theory. The usefulness of the measures is addressed by commenting on the relationship between software measurement and software quality assurance.
Keywords - software measurement, early quality assurance, conceptual modelling, formal specifications
On the relationship software quality - software measurement The present paper was inspired by a number of observations from software engineering practice regarding the relationship software quality and measurement. Before starting with the main theme of this text, i.e., the measurement of formally specified conceptual models, we need to discuss its relevancy for early software quality assurance. It is hoped that this introduction helps to position our research into the broader field of software quality research. Since the publication of Boehm’s research on software economics [1] there has been an increasing awareness that the first stages in software development are crucial for the quality of software. Apart from software process assessment and improvement programs, it is essential that early quality assurance focuses on the specifications of the software such as captured in a vast number of software products. The establishment of effective early quality assurance programs relies heavily on three activities: 1) the identification and measurement of characteristics of software products as well as the identification and measurement of relevant software quality characteristics; 2) empirical research aimed at establishing and validating relationships between software product characteristics and software quality characteristics; 3) the elaboration of strategies, methods, techniques, tools and practical guidelines to affect or optimise the value of the software product characteristics such that the quality of the software is controlled and, if needed, improved. The first two activities are within the domain of the software engineering research community, both in academics and research institutes. Only the results of the third activity are of direct interest to the software engineer and the quality assurance staff. Nonetheless, the three activities are highly interrelated and the role of software measurement and experimentation must not be underestimated. In this paper results of ongoing research are presented which must be positioned in the fields of software measurement and quality assurance through formal specification techniques. Over the last couple of years a methodology (called M.E.R.O.DE.) has been developed that includes formal techniques to specify conceptual models of an application domain. The use of this methodology has a direct impact on the quality of the specifications since it allows to formally check whether the specifications are consistent, correct and complete. But apart from this kind of early quality assurance it was decided to define a number of specification measures. The basic hypothesis guiding this research is that the characteristics of the formally defined specifications are related to quality characteristics (e.g., comprehensibility, maintainability, reusability, etc.) of the software that must be delivered. While our research, and accordingly the paper, has not addressed this relationship so far, work has been done to rigorously define measures of specification characteristics, in particular the size and the complexity of conceptual models. The measurement of these characteristics is necessary to control their values as well as to establish empirical relationships between them and quality characteristics (activity 2) and to conceive quality control and improvement programs (activity 3). The paper reports on our research efforts to define size and complexity measures of conceptual models that are meaningful and theoretically valid.
2
Conceptual modelling with formal specifications Many conceptual modelling methods offer a combinative approach [2] to modelling. The static aspects of an application domain are mostly modelled using some variant of EntityRelationship Diagramming [3]. Techniques related to Finite State Machines [4] are popular for the specification of behavioural aspects. For the modelling of communication and interaction aspects no particular preferred technique can be observed. The use of different techniques to model the different aspects of an application domain is not wrong per se. However, a thorough analysis of a number of OOA methods has revealed that there is a serious lack of techniques and procedures to guarantee inter-model consistency, to formally define the overall system behaviour and to check for problematic system behaviour [5]. A modelling method based on a formal specification technique has the potential to avoid these problems. In this section one such method is presented. The method of interest is M.E.R.O.DE., which is an acronym for Model-driven Existence-dependency-Relationship Object-oriented DEvelopment [5], [6]. Basically, a process algebra was developed that, through inclusion of an existence dependency classification schema, is a suitable technique to model the static, dynamic and interaction aspects of an application domain. Hereafter, a very concise summary of the M.E.R.O.DE. process algebra is presented. The interested reader may wish to consult [5], [6] for additional details. The M.E.R.O.DE. process algebra Conceptual modelling refers to the identification of the components of an application domain. Two relevant types of components are object types and event types. Object types are classes of similar objects that participate in events that are atomic, have no duration and can be observed in the application domain. In turn events are classified as occurrences of event types. Let A be the universe of event types associated with the application domain that is our universe of discourse. The power set of A is denoted by P(A). The alphabet of an object type is the set of event types participated in. An object type participates in an event type if occurrences of the object type participate in occurrences of the event type. For every object type in the conceptual model with alphabet α, it holds that α ∈ P(A). A set of regular expressions over A can be built by the operators ‘.’ (sequence), ‘+’ (selection) and ‘*’ (iteration). The set of regular expressions over A is denoted by R*(A). The sequence constraints of object types on participation in event types are defined by such regular expressions over A. These regular expressions are mathematically equivalent to Finite State Machines [4] and Jackson Structure Diagrams [7]. For every object type in the conceptual model with regular expression e, it holds that e ∈ R*(A). Basically, object types are defined as tuples ∈ such that e is not in deadlock and every event type in α occurs at least once as an operand in e. Also, every operand in e must be an element of α. To select the alphabet and regular expression of an object type, the selector functions SA and SR are defined: SA: → P(A): P → α SR: → R*(A): P → e
3
It is further required that for each object type it must be possible to create an occurrence and to end the life of an occurrence. Hence, for the object type P, the alphabet SAP is partitioned into the three disjoint subsets c(P), m(P) and d(P) where c(P) = {a ∈ A a creates an occurrence of type P} ⊆ SAP m(P) = {a ∈ A a modifies an occurrence of type P} ⊆ SAP d(P) = {a ∈ A a ends an occurrence of type P} ⊆ SAP and c(P) and d(P) may not be empty. Based on these three subsets the default sequence constraints1 are given by ∑c(P) . (∑m(P))* . ∑d(P). The actual sequence constraints of an object type cannot be less deterministic than this default. The object types in a conceptual model are related. The classification schema used in M.E.R.O.DE. is the existence dependency relation. Object type P is existent dependent of object type Q if the life of each occurrence p of type P is embedded in the life of one single and always the same occurrence q of type Q. The object p is the marsupial object. The object q is the mother object. According to the M.E.R.O.DE. process algebra, if P ← Q then model consistency can be guaranteed by applying a set of rules2 including the • Propagation rule: SAP ⊆ SAQ Existence dependent objects cannot participate in events without the mother having knowledge of this event. • Type of involvement rule: c(P) ⊆ c(Q) ∪ m(Q) and m(P) ⊆ m(Q) and d(P) ⊆ m(Q) ∪ d(Q) A marsupial cannot be created before its mother exists nor can it exist after the life of its mother has ended. • Restriction rule: SRP may not be less deterministic than SRQ Any sequence of events in which a marsupial participates that is not acceptable from the point of view of the mother, must be rejected. Let A be the universe of event types. A conceptual model is basically a set of object types M ⊆ that build a lattice of existence dependency relationships (see [5] for a formal definition). Example - A Car Renting Company The previous paragraph is illustrated using the conceptual model of a Car Renting Company. The existence dependency lattice is shown in fig. 1. The Rental object type is a contract between the Car and the Client object types. Its specifications stipulate the correct order of events that occur during the rent of a car. Apart from the rental interaction between a car and a client, these objects participate in other events. These are specified in the object types Car and Client. Car
Rental
Client
Figure 1: Existence Dependency Graph Car Rental Company
The symbol ∑ must be read as an exclusive and exhaustive selection. For instance, ∑c(P) means that object occurrences are created by one and only one create event belonging to an event type in c(P). 2 For the complete set of consistency, correctness and completeness rules see [5] and [8]. 1
4
The formal specifications of the object types are presented below. Car = Client = Rental = The life cycles of car, client and rental objects proceed as follows. A car is first acquired by the company. Next it must be prepared for its first rent, e.g., painting the company’s name on the doors. Next the car is ready to be rented by clients of the company. This is specified by an iteration on rent and return event types, representing a ‘normal’ rental procedure. The iteration may be executed zero, one or more times. After x rentals, the life of the car is ended. Either it is sold or it is written off. The life of a car can also be ended by a crash while on rent. Note that inside the ‘normal’ rent iteration there is a selection of rent and return events. Normally, a car is rented first and then returned. However, since a car can be crashed a selection instead of a sequence is needed. Indeed, if a car is rented and next crashed, then no return event occurs. The life cycle of the client is the default or trivial life cycle. When renting a car for the first time, a person becomes a client of the company. He stays a client unless an explicit event such as a decease. Again, the selection in the iteration allows the client to crash a car instead of returning it. Finally, the life cycle of a rental is specified. Since the rental is a contract between a car and a client, its regular expression determines the interaction. First, a car is rented. Next, it is either crashed or returned. Remark: The process algebra such as described in this paper does not consider cardinality constraints. Normally, a car can only be rented by one person at the same time. Unless it is returned no other client may rent the car. The specifications such as shown here do not impose this constraint as yet.
5
Measurement The primary criterion for a measure is validity. Measure validity is guaranteed by the use of established theories of measurement. In this section one such theory, i.e., Measure Theory, and particularly the concept of ‘metrics’, is applied to software measurement. While this approach has been suggested by Fenton [9, pp. 59-60] and Ejiogu [10], it has not been adequately investigated by software measurement research. On the contrary, in the software measurement literature numerous examples of software metrics can be found that are not metrics in the sense of Measure Theory. Mathematically, a metric is a measure of distance. It is defined as [11, p. 46]: Let X be a set. A real-valued function δ on X × X is a metric if and only if the following properties hold: 1. δ(x, y) > 0 unless x = y and δ(x, x) = 0 2. δ(x, y) = δ(y, x) 3. δ(x, y) + δ(y, z) ≥ δ(x, z) A software metric is a metric where the set X is a set of software entities. In this section it is argued that it is meaningful to measure distances between software entities. By a proper selection of software entities (i.e., the arguments of the function δ) the concept of distance can be mapped into the concepts of size and complexity. In a first sub-section the selection of software entities is discussed. Next, metric-based measures of size and complexity are presented. Objects of measurement Each object type is defined by its formal specifications of alphabet and regular expression. Therefore, the size and complexity of an object type must be a function of its alphabet and regular expression. Accordingly, the size and complexity of an object type can be defined as distances from the specifications to something else. Let us first consider size and next complexity. Size The smallest object type that is valid according to the M.E.R.O.DE. process algebra is the object type with null alphabet. By default, this object type has no sequence constraints. Its regular expression is the default life cycle on the empty alphabet, which is represented in the process algebra by the symbol “1”. Formally, the smallest object type is defined by . Size is considered as a relative concept. It can be evaluated by comparing the specifications of an object type to the specifications of the smallest object type in the process algebra. Therefore, the size of an object type P = is defined as the distance from to . Complexity Complexity is a difficult concept to define [9, p. 153]. Nonetheless, we believe that an approach similar to that taken for size can lead to a meaningful and useful definition of complexity. Our definition is based on a crucial assumption. Every event type in the alphabet and every sequence constraint in the regular expression adds to the complexity of an object type if its presence is not required to satisfy the rules of the process algebra. This assumption is
6
motivated by the observation that every object type in a conceptual model must obey the rules regarding consistency, completeness and correctness (cf. previous section). For instance, the propagation rule requires that the alphabet of a marsupial object type is contained within the alphabet of its mother object types. The presence of these ‘marsupial’ event types in the alphabet of the mother object type does not add to its complexity but to its size. Complexity may in this regard be considered as the opposite of simplicity. Because of the propagation rule the object type cannot be simplified by removing the ‘marsupial’ event types, since this would violate model consistency. On the other hand, there might be some event types that can be safely removed. We consider these event types as factors of complexity. Analogously, all sequence constraints that can be removed without violating the rules contribute to the complexity of the object type. Complexity is also a relative concept. It is evaluated by comparing the specifications of an object type to its minimum specifications, i.e., the specifications that are simplified as much as possible without violating the rules of the process algebra. Since a simplification in one object type might induce simplifications in other object types, all specifications in a conceptual model must be simultaneously simplified. Now, the complexity of an object type is defined as the distance from its original specifications to its minimum specifications. Note that this definition of complexity strongly depends on the above assumption. All definitions of software attributes (e.g., size, length, complexity, coupling, cohesion, functionality, etc.) are more or less subjective and implicitly based on intuition. However, the assumption allows to distinguish between the concepts of size and complexity. Example A set of minimum specifications for the Car Rental Company is: Car = Client = Rental = An alternative set is: Car = Client = Rental = Basically, the minimum specifications must allow that object occurrences can be created and ended. However, when in the original specifications more than one create or end event type is included in the alphabet, then one might freely choose which event types to withhold and which to remove. For instance, the life of a car can be ended by a ‘sell’ or a ‘write-off’ event. As a consequence, different sets of minimum specifications exist. For measurement purposes it does not matter which set is chosen. Assume that the first schema is chosen. One can now evaluate which event types and which sequence constraints contribute to the complexity of the object types. Factors of complexity are: • Clients may crash the car that is on rent; • A car’s life can be ended by selling it; • After a car is acquired it must be prepared for its first rental.
7
These factors truly represent the complexity of the application domain. The domain can be simplified by removing these options from the specifications while still satisfying all consistency, completeness and correctness rules. The result is another valid, but significantly less realistic model of the Car Rental Company. The metrics Clearly, two sets of metrics must be defined: one for measuring the distance between alphabets and one for measuring the distance between regular expressions. Distance - Alphabet Each alphabet is a subset of the universe of event types A. The symmetric difference model defines a metric distance between sets [11, p. 208]. According to this model the distance between sets X and Y is defined by the number of elements in (X - Y) ∪ (Y - X). Therefore we define Let A be the universe of event types and let M ⊆ be the set of object types that builds the conceptual model. Let the number of elements of a set be selected by the function “#”. For all object types P = ∈ M, where are the minimum specifications3 of P: Sizealphabet(P) = #((α - ∅) ∪ (∅ - α)) = #(α) Complexityalphabet(P) = #((α - α’) ∪ (α’ - α)) = #(α - α’) The alphabet size and complexity of a conceptual model is simply defined as the sum of the size and complexity values of its object types. Let A be the universe of event types and let M ⊆ be the set of object types that builds the conceptual model. Sizealphabet(M) =
∑
Sizealphabet(P)
P ∈M
Complexityalphabet(M) =
∑
Complexityalphabet(P)
P ∈M
Distance - Regular Expression Since regular expressions are not merely sets of elements, a different model must be used to define a metric. To measure the distance between two regular expressions an approach is taken similar to the solution to the tree-editing problem [12], [13]. First, a set of elementary transformations is defined. Each elementary transformation is an editing operation on regular expressions. Let A be the universe of event types. For e, e’ ∈ R*(A): ti(e) = e’ where ti(e) for subscript i = 0, 1, 2, …, 9 is defined as: t0(e) = e . x = e’ t1(e) = x . e = e’ t2(e) = e + x = e’ t3(e) = x + e = e’ t4(e) = (e)* = e’ t5(e) = t5(e’ . x) = e’ 3
Note that α’ ⊆ α.
8
(add right sequence event type) (add left sequence event type) (add right selection event type) (add left selection event type) (add iteration) (delete right sequence event type)
t6(e) = t6(x . e’) = e’ t7(e) = t7(e’ + x) = e’ t8(e) = t8(x + e’) = e’ t9(e) = t9((e’)*) = e’ and x ∈ A
(delete left sequence event type) (delete right selection event type) (delete left selection event type) (delete iteration)
Given a regular expression e over A, all elementary transformations ti may be applied to e or to any part of e that is a regular expression over A. For e, e’, e” ∈ R*(A): (i) e = e’ . e” ⇒ ti(e) = ti(e’. e”) or ti(e’) . e” or e’. ti(e”) (ii) e = e’ + e” ⇒ ti(e) = ti(e’ + e”) or ti(e’) + e” or e’ + ti(e”) (iii) e = e’* ⇒ ti(e) = ti((e’)*) or (ti(e’))* The distance between e and e’ is modelled as the shortest T-derivation [12] from e to e’. Let T be a sequence of ti1, …, tik elementary transformations. A T-derivation from e ∈ R*(A) to e’ ∈ R*(A) is a sequence of regular expressions e0, …, ek such that e = e0, e’ = ek, and tij(ej-1) = ej for 1 ≤ j ≤ k. The length of a T-derivation is the number of transformations in T. It is also equal to the number of regular expressions in the T-derivation minus one. Now the regular expression size and complexity of an object type can be defined. Let A be the universe of event types and let M ⊆ be the set of object types that builds the conceptual model. Let the shortest T-derivation from e ∈ R*(A) to e’ ∈ R*(A) be represented by T-Dmin(e, e’) and let the length of a T-derivation be selected by the function “#”. For all object types P = ∈ M, where are the minimum specifications of P: Sizeregular_expression(P) = #(T-Dmin(e, 1)) Complexityregular_expression(P) = #(T-Dmin(e, e’)) The corresponding definitions for conceptual models are: Let A be the universe of event types and let M ⊆ be the set of object types that builds the conceptual model. Sizeregular_expression(M) =
∑
Sizeregular-expression(P)
P ∈M
Complexityregular-expression(M) =
∑
Complexityregular-expression(P)
P ∈M
Example The following table contains the size and complexity measurements for the Car Rental Company:
Car Client Rental Model
Sizealphabet 7 5 3 15
Sizeregular_expression 8 6 3 17
Complexityalphabet 3 1 1 5
9
Complexityregular_expression 3 1 1 5
Discussion The four metrics discussed so far build a very basic set of specification measures. Every conceptual model and every object type can be assessed using this base set of measures. The measurements inform us on the complexity and size of the application domain such as modelled using the M.E.R.O.DE. process algebra. They can therefore be used in empirical research to validate hypothesised relationships between these size and complexity characteristics on the one hand and software quality aspects on the other hand. As mentioned in the introduction, we must yet proceed to this phase of research. The main aims of this paper can be summarised as follows: • We wished to clarify the relationship between software measurement and software quality to position our work into the broader field of quality research. Especially for early quality prediction the measurement of software product abstractions is an essential activity. Therefore these measurement problems must be tackled first. • Timely measurement is crucial for early quality prediction and control. Whereas the measurement of software design artefacts (i.e., design measurement) has received a lot of attention lately, we argue that measurements of the software problem domain can provide additional insights. • Our research demonstrates that software measurement can be extended to formal specifications. The use of the M.E.R.O.DE. process algebra must in this respect be considered as merely an illustration of how to measure formal specifications. • In this paper basic concepts from Measure Theory were used to define measures that really deserve to be called software metrics. It was not investigated whether this approach can be extended to other modelling techniques. However, we believe that every technique with a sufficient level of formality is a candidate for our approach. Finally it must be noted that the software measures proposed here are valid as long as one accepts the definitions of size and complexity. However these definitions are not absolute and they only apply to the M.E.R.O.DE. models. In our opinion it is more important to define measures that are valid given the definitions of the concepts they measure than to seek for general definitions of, for instance, complexity.
10
References [1] Boehm B.W., Software Engineering Economics, Prentice-Hall, 1981. [2] Cockburn A.A.R., “The impact of object-orientation on application development”, IBM Systems Journal, Vol. 32, No. 3, 1993, pp. 420-444. [3] Chen P.P., “The Entity-Relationship Model - Towards a Unified View of Data”, ACM Transactions on Database Systems, Vol. 1, No. 1, 1976, pp. 9-36. [4] Aho A. and D. Ullman, The theory of Parsing, Translation and Compiling. Volume 1: Parsing, Prentice-Hall, Series in Automatic Computation, 1972. [5] Snoeck M., On a Process Algebra Approach for the Construction and Analysis of M.E.R.O.DE.-Based Conceptual Models, Phd dissertation, Katholieke Universiteit Leuven, 1995. [6] Dedene G. and M. Snoeck, “M.E.R.O.DE.: A Model-driven Entity-Relationship Objectoriented DEvelopment method”, ACM Sigsoft Software Engineering Notes, Vol. 13, No. 3, 1994, pp. 51-61. [7] Jackson M.A. and J.R. Cameron, System Development, Prentice-Hall, 1983. [8] Dedene G. and M. Snoeck, “Formal deadlock elimination in an object oriented conceptual schema”, Data and Knowledge Engineering, Vol. 15, No. 1, 1995, pp. 1-30. [9] Fenton N., Software Metrics: A Rigorous Approach, Chapman & Hall, 1991. [10] Ejiogu L.O., “Beyond Structured Programming: An Introduction to the Principles of Applied Software Metrics”, Journal of Structured Programming, No. 11, 1990, pp. 2743. [11] Suppes P., D.H. Krantz, R.D. Luce and A. Tversky, Foundations of Measurement. Volume II: Geometrical, Threshold and Probabilistic Representations, Academic Press, Inc., 1989. [12] Zhang K. and D. Shasha, “Simple fast algorithms for the editing distance between trees and related problems”, Siam Journal on Computing, Vol. 18, No. 6, 1989, pp. 12451262. [13] Oommen B.J., K. Zhang and W. Lee, “Numerical Similarity and Dissimilarity Measures Between Two Trees”, IEEE Transactions on Computers, Vol. 45, No. 12, 1996, pp. 14261434.
11