Modularity in Design: Formal Modeling and Automated ... - CiteSeerX

2 downloads 110 Views 4MB Size Report
hiding modularity is a cornerstone of modern software design thought, but its ..... tion changes often drive software evolution, we call such an automaton a design ...
Modularity in Design: Formal Modeling and Automated Analysis 1 Yuanfang Cai and Kevin J. Sullivan University of Virginia

Designers often seek modular architectures for complex systems so that their systems better accommodate expected changes, have parts that can be developed and evolved without further coordination, and to ease the understanding of complex designs through abstraction of details hidden within modules. However, current design modeling techniques do not effectively support design modularization and evolution analysis. In this paper, we present a framework to enable automatic and quantifiable software modularization and evolution analyses for high-level design abstractions. The framework contributes a model to substantiate the concept of information hiding as a measurable criterion for software designs, an algorithm to extract the dependence structure of an abstract design, and an approach to quantitatively analyzing the changeability of a design. We have imported existing engineering techniques and economic analysis into software, and made these abilities formalized and automated. We illustrate and demonstrate the potential utility of our modeling and analysis techniques by modeling and analyzing several design examples. Categories and Subject Descriptors: D.2.10 [Software Engineering]: Design—Methodologies General Terms: Design, Evolution, Modularity Additional Key Words and Phrases: Design Modeling, Design Analysis, Software Economics

1.

INTRODUCTION

The goals of software engineering research include reducing the cost of software development and evolution, reducing the time to complete development tasks, and improving software quality. To a significant degree, outcomes in all of these dimensions depend on design structure. In particular, designers seek to structure their systems to better accommodate expected changes, have parts that can be developed and evolved without further coordination, and to ease the understanding of complex designs through abstraction of details hidden within modules. However, current design modeling techniques do not effectively support design modularization and evolution analysis. People have also recognized for decades [Alexander 1970; Ashby 1952; Simon 1996; Stevens et al. 1974; Parnas 1972; Brooks 1987; Dijkstra 1982] that the coupling structures among design decisions are the determinants of software modular structures and evolvability. Researchers have explored various ways to analyze source code dependence structures, such as call graphs. However, in many cases, designers have to answer some important questions before investing in coding. For example, will my design accommodate envisioned changes easily? Given a project deadline, is it worthwhile investing in making a system more flexible, for example, by refactoring? It is difficult to answer these questions because we lack the approach to analyzing the dependence structures of high-level design models. We have been developing a framework to enable automatic and quantifiable software 1 This paper is extended from our previous conference papers [Cai and Sullivan 2005; Sullivan et al. 2005; Sullivan et al. 2001], and some of the prose and figures are from them. ACM Journal Name, Vol. V, No. N, Month 20YY, Pages 1–0??.

2

·

modularization and evolution analyses for high-level design abstractions. Our framework contributes a model to substantiate the concept of information hiding as a measurable criterion for software designs, an algorithm to extract the dependence structure of an abstract design, and an approach to quantitatively analyzing the changeability of a design. We have imported existing engineering techniques and economic analyses from other design areas to benefit software designs [Sullivan et al. 2001]. In our framework, we made these abilities formalized and automated. Our framework integrates intellectual contributions from three fields: software design, financial economics, and engineering systems design. Our approach places important informal earlier work on a logical foundation. We have implemented a supporting tool called Simon [Cai and Sullivan 2005] to automate these abilities. The evaluation of the framework in a set of case studies has produced promising results. In this paper, we present our formal framework and associated analysis techniques, and illustrate their potential utility using several canonical design examples. The rest of this paper is organized as follows. Section 2 presents the background and a more extensive overview of our work. Starting from a small running example, Section 3 introduces our design modeling approach and analysis techniques informally. Section 4 introduces the core formalization of the modeling and analysis approach for precision. Section 5 presents our tool, Simon. Sections 6 through Section 7 present our case studies in design modeling and analysis. Section 8 discusses how our model formalizes and integrates different work from software engineering, financial economics, and engineering systems design. Section 9 presents related work. Section 10 discusses related issues and concludes. 2.

BACKGROUND AND OVERVIEW

Software modularity and evolvability are closely related. Parnas’s concept of information hiding modularity is a cornerstone of modern software design thought, but its formulation remains casual and its emphasis on changeability remains immeasurable. We need better models of information hiding for both their explanatory power and prescriptive utility. Software evolution is usually driven by the changes of environment conditions and design decisions. However, prevailing design models do not support the modeling of environment conditions and important design decisions. We need more general design modeling techniques to support the changeability analysis of software designs. In our framework, we use constraint networks (CNs) in Artificial Intelligence to model design spaces and designs [Mackworth 1977]. Using a CN, we model the design decisions and environmental conditions possible in any dimension of a design space and the constraints that relate them both explicitly and precisely. We model design dimensions and environment conditions as design variables with discrete domains, possible decisions as the possible values of these variables, and dependencies among choices as constraints. However, there are at least two design phenomena that are indispensable in any design activity, but are not readily expressed using a pure constraint network: the aggregation of decisions into modules and the dominance relation among design decisions. For example, agreed interfaces often dominate decisions within modules. We augment constraint networks with a cannot influence relation to model the dominance relations. We use clusterings of variables to model the aggregation of decisions into (proto-)modules.2 . We call our modeling method a augmented constraint networks (ACNs) 2 By

the term proto-module we mean a group of design decisions that might or might not be independent of other

ACM Journal Name, Vol. V, No. N, Month 20YY.

·

3

Design abstraction itself is not enough to analyze design evolution phenomena. Although we have plenty of ways to abstract a design, we lack abstract models of design evolution to handle its complexity. We created a design evolution model called the Design Automaton (DA) to represent design variations driven by the changes of environment conditions or design decisions. We also devised an algorithm to derive the evolution models of abstract design models. Our abstraction of software design and design evolution has enabled a series of analyses that answer important design questions. The first analysis is automatic dependence structure extraction. Dependence structure is the key to design modularization and evolution. People have explored numerous ways of deriving dependence structures from source code. However, the dependence structures in high-level design models, which precede the coding investment and determine the quality of implementation, have not been well studied. Based on our abstract design and evolution modeling, we contributed an algorithm to derive the coupling structures from high-level design models. Another type of analysis enabled by our framework is design impact analysis (DIA). DIA helps the user to rigorously answer a number of important questions in design: given a design and a sequence of expected changes, what are the consequences? How many ways are there to accommodate them? Based on our abstract design evolution model, we formalized the problem, solved it formally, and implemented it in our tool. Using DIA, we have automated and quantified Parnas’s qualitative changeability analysis of the KWIC systems [Parnas 1972]. There are design modeling and economic analysis techniques in other design areas that could benefit software engineering. We contributed an approach to automatically deriving Design Structure Matrix (DSM) models from abstract design models. DSM modeling is widely studied and used for design task structuring and optimization in other design areas, such as vehicle design and civil engineering. Importing DSMs into software engineering makes it possible to apply existing engineering techniques and tools to software designs. In our previous work [Sullivan et al. 2001], we evaluated the potential of a new theory— developed by Baldwin and Clark to account for the influence of modularity on the evolution of the computer industry—to inform software design. The theory uses DSMs to model designs and an economic analysis model called net option value (NOV) to evaluate them. They introduce the notion of design rules to arrive at a generalized model of information hiding modularity. A design rule is a decision to agree on a common design parameter— an interface—that decouples other design parameters that would otherwise be coupled. To test the potential utility of the theory for software, we mapped Parnas’s KWIC designs into DSMs, contributed a novel method to quantify the changeability, which enabled the economic analysis for software designs. The NOV analysis confirmed Parnas’s conclusions with numbers and the DSM models showed that the information-hiding criterion does have a tangible and measurable form in a DSM. Our framework now supports automatic Net Option Value (NOV) analysis for abstract design models. Our framework substantiates the concept of information hiding as a measurable criterion for software designs. As we have shown [Sullivan et al. 2001], in an information hiding design, design rules must be invariant with respect to anticipated changes in environment variables: such changes should be accommodated by changes to hidden variables. The such groups. We reserve the term module for a group of decisions that is independent of other groups (although it might depend on higher-level contracts, design rules, or interfaces that serve to create such independence). ACM Journal Name, Vol. V, No. N, Month 20YY.

4

·

framework we develop in this paper gives us the vocabulary we need to formalize this idea precisely. We could formalize it as a predicate stating that a coupling relation derived from an augmented constraint network should not have any pair with a first element in an environment variable cluster and the second in a design rule cluster. Our formal models and automated techniques have revealed errors and ambiguities in both our earlier work [Sullivan et al. 2001] and recent work by Lopes et al. [Lopes and Bajracharya 2005]. We also found an oversight in the options value model of Baldwin and Clark that traces to subtleties in the definition of the DSM. Our framework links together an engineering system modeling and analysis technique (DSM), economics analysis (NOV), and software evolution analysis based on information hiding (DIA). Our prototype tool, Simon, supports ACN model development, DSM derivation, NOV computation and design impact analysis. The details of Simon has been introduced in our recent paper [Cai and Sullivan 2005]. The next section makes these ideas concrete, and elaborates upon them, in terms of a very simple example.

ACM Journal Name, Vol. V, No. N, Month 20YY.

· 3.

5

AN EXAMPLE

Our method models design as a decision-making problem, under constraints, where decisions can be grouped into proto-modules, and where some decisions can dominate others. In this section, we model a design comprising choices of a matrix data structure and algorithm to illustrate our modeling and analysis approaches informally. For a matrix design, the best choice depends on how the client uses matrices. An array could be the best choice for a dense matrix, and a linked list could be the best for a sparse matrix. In our modeling and analysis approaches, we consider not only design decisions but also environment conditions, such as the client demand characteristics. 3.1

Design Modeling

We represent the dimensions of data structure and algorithm as variables: ds and alg, and represent the client demand condition as matrix. If the choices for the data structure are array, list, and other unelaborated choices, the domain of ds is {array ds, list ds, other ds}. Similarly, the domain of alg is {array alg, list alg, other alg}. The demand condition matrix has domain {dense, sparse}. Our modeling thus spans both design variables and environment variables (which are not formally different from design variables). We use the binding of a value to a variable, assignment, to model a given decision or an environment condition. For the matrix example, {ds = array ds, matrix = dense} is an assignment. If all the variables have values in an assignment, we call the assignment a valuation. For the matrix example, {matrix = dense, ds = array ds, alg = array alg} is a valuation. Next, we model the interdependence relation among design variables and environment conditions as a set of logical constraints. For the matrix design, one example is: ds = array ds ⇒ matrix = dense, modeling that the choice of an array is valid only if the client needs dense matrices. We model the binding of the assuming variable as implying the assumed binding. This might seem counterintuitive, but there could be other data structure choices that are also consistent with density, and we do not want to model an overly constrained design in which array is the only choice. Thus the implication arrows are opposite of what one might initially expect. The variables, domains, and constraints constitute a finite-domain constraint network (FDCN), the core of our framework. However, design involves more than logic. First, environment conditions are often outside of the designers’ sphere of influence. The client demand condition matrix is such an example. In design activities, some design decisions dominate others. For example, the designer of an interface might prevail upon the implementation designer to conform to the interface specifications. Asymmetric dominance is essential to Baldwin and Clark’s notion of a design rule. We use a binary relation cannot influence to model the dominance relation. (x, y) ∈ cannot influence indicates that changes in x cannot be compensated for by changes in y (even if changes in y can be accommodated by changes in x). In the matrix example, we have assumed that the client’s need dominates, and the design decisions must be adapted accordingly. (ds, matrix) and (alg, matrix) should be in the cannot influence relation. Second, subsets of decisions are often considered as a whole. The design decisions within a class, for instance, are usually considered together. A logical framework doesn’t immediately lend itself to modeling the clustering of design decisions into proto-modules. We simply use an additional structure, cluster, to express the a priori clustering of subsets ACM Journal Name, Vol. V, No. N, Month 20YY.

6

·

Fig. 1. Matrix ACN model

of variables into proto-modules. The same design can have different clustering methods, reflecting different stakeholders’ views of the design. We call a constraint network augmented with a cannot influence relation and a clustering as an augmented constraint network (ACN). Figure 1 shows the matrix ACN developed using our tool, Simon. Simon allows the user to input the cannot influence relation through a matrix GUI, as shown in Figure 1 (B). We can cluster matrix into an environment cluster and the other two are within a design cluster, as shown in Figure 1 (C). Other non-logical data structures could be added to the core model. For example, many economic analyses require additional annotations and auxiliary economic models, such as the cost to change a given design decision. However, these are beyond the scope of this paper. Representing design spaces using ACNs is idealized, but simple; it clearly captures the notion of design as a decision-making problem under constraint; and it’s a reasonable starting point for a formal account of coupling in design viewed as a decision-making activity. The modeling approach is general, abstract, and can be used at any level of detail, from high-level specification and architectural decisions to extremely detailed ones. Many concerns can naturally be represented as variables: response time, security policy, choices of function or class names and signatures, choices of design patterns to use, etc. These concerns play important roles in software evolution, but cannot be modeled and analyzed effectively in prevailing design modeling methods. 3.2

Analyzable Design Space Modeling

Representing designs with constraints isn’t new [Mackworth 1977]. It’s the basis for much work on artificial intelligence in design. Our work is different. We do not seek to automate design as search, the goal of AI-in-design work. We want to model and analyze the ACM Journal Name, Vol. V, No. N, Month 20YY.

· state 0 state 1 state 2 state 3 state 5 state 5

7

Table I. Matrix Design Space ds = other ds matrix = sparse alg = other alg ds = other ds matrix = dense alg = other alg ds = array ds matrix = dense alg = array alg ds = array ds matrix = dense alg = other alg ds = list ds matrix = sparse alg = other alg ds = list ds matrix = sparse alg = list alg

coupling structures on design decisions, in order to develop formal accounts of central but informal ideas in software engineering. Modeling the design variation under the coupling structure defined by an ACN is the key to our goal. In this subsection, we present such a design variation model, which serves as the basis for later analyzes. A valuation satisfies a constraint if and only if its projection onto the variables of that constraint is consistent with at least one permitted assignments of that constraint. For example, the valuation {ds = array ds, matrix = dense, alg = array ds} satisfies the constraint ds = array ds ⇒ matrix = dense because one of its permitted assignment, {ds = array ds, matrix = dense}, is the subset of the valuation. A valid design is modeled as a solution of the FDCN. All the valid designs constitute a design space. Table I shows the matrix design space. Changing a design decision can violate given constraints. In our matrix example, starting with the design, {(matrix = dense), (ds = array ds), (alg = array alg)} and changing ds to list ds violates a constraint, producing an invalid design state. If such an invalidating change must stick, consistency restoration demands changes to some subset of other variables. When the matrix data structure is changed, the algorithm has to be reconsidered, which we model as changing alg to list alg or other alg. A nondeterministic finite automaton, with each solution as a state, can be used to represent such constraint network perturbations. Since design decision or environment condition changes often drive software evolution, we call such an automaton a design automaton (DA). Figure 2 presents a part of the DA for the matrix example, starting from state 2, where the client needs dense matrices and so an array data structure. Changing the client preference to sparse makes the design inconsistent. Making a set of minimal changes to other variables to restore consistency leads to states 0, 4, or 5. Each transition in a DA is minimal, that is, the destination state differs only minimally from the previous state, in the sense that no such change could be undone while still preserving consistency. For the matrix example, if the matrix is now under design state 2 as shown in Figure 2, and ds is changed to other ds, then there are at least two designs that can accommodate the change: {ds = other ds, matrix = sparse, alg = other ds} (state 0) and {ds = other ds, matrix = dense, alg = other ds} (state 1). Changing alg to other ds in both states 0 and 1 is necessary, but changing matrix to sparse in state 0 is not. We consider the transition from state 2 to state 1 as minimal. No transition in a DA may violate the cannot influence relation. If (x, y) is in cannot influence, then among all the possible ways to restore consistency in the face of a change to x, those involving y are excluded. For the matrix example, because (ds, matrix) is in cannot influence, the transition starting from state 2, triggered by changing ds from array ds to list ds, and leading to the client change in state 5 (the dotted arrow labeled ds = list ds) is precluded. ACM Journal Name, Vol. V, No. N, Month 20YY.

8

·

Fig. 2. Partial matrix design automaton

3.3

Analysis

The DA concept enables a range of automated analysis techniques, including Design Structure Matrix (DSM) derivation, Baldwin and Clark’s net option value (NOV) computation based on DSMs, and Design Impact Analysis (DIA). Since the NOV compuation has been introduced in detail in previous works by Sullivan, et al. [Sullivan et al. 2001; Sullivan et al. 2005] and Lopes et al. [Lopes and Bajracharya 2005], in this paper we just mention it briefly and mainly present the other two analyses. 3.3.1 DSM Derivation. DSMs originated in the work of Steward [Steward 1981], and were later developed by Eppinger and others [Eppinger 1991]. They are widely studied and used for design task structuring and optimization in a range of industries, such as vehicle design and civil engineering. DSM modeling has been supported by tools such as DeMAID [James L 1996], and is at the heart of Baldwin and Clark’s work. DSMs provide a general model and accessible visual representations of design dependence structures. DSMs are square matrices, in which the rows and columns labeled with design variables, representing dimensions in which design decisions are made. A mark in a cell denotes a pairwise dependence of the variable on the marked row on the variable on the column. A design or a part of a design is modular if its elements do not depend on each other (although they may depend on external contracts, or design rules, that serve to decouple them). Modular implementations can proceed in parallel: these elements can be developed or changed independently, modulo their conformance to any prevailing contracts. The problems with DSMs, from our perspective, is that they are informal and thus lack important semantic information. The key to formalizing DSMs is to formally define pairwise dependence between design variables. In this paper we show that the marks in a DSM are best understood as a highly aggregated summary form of the dependence information in a DA. We thus formalize a reduction from a DA to a DSM. Intuitively, for some consistent design state s in a DA, if there is some change to a variable such that the value of another variable is changed in some minimally perturbed destination state s0 of the DA, we define these two design variables to be pairwise depenACM Journal Name, Vol. V, No. N, Month 20YY.

·

9

Fig. 3. Matrix DSM generated by Simon

dent. We define the coupling structure of a design ACN as the dependence relation over all of its variables. For the matrix example, if the original design is {(matrix = sparse), (ds = list ds), (alg = list alg)} and the envisioned change in client is (matrix = dense), there are three new designs accommodating this change in its DA: {(matrix = dense), (ds = array ds), (alg = array alg)}, {(matrix = dense), (ds = array ds), (alg = other alg)}, or {(matrix = dense), (ds = other ds), (alg = other alg)}. The coupling relation computed this way includes (matrix, ds), (matrix,alg), (ds, alg), and (alg, ds). A DSM can be seen as composed of a pairwise coupling relation and an a priori clustering of variables. A graphical depiction of the resulting matrix can provide a high-level view of the dependence structure on a design organized as a set of proto-modules. A coupling relation derived from an ACN can be used to populate a DSM, and the clustering structure of it can be used to express the order in which the rows and columns are presented. Figure 3 is the DSM that Simon generated from the matrix ACN model. For what appears to be the first time, we provide rigorous semantics for DSMs, and enable their automated generation from precise logical models. The derivation of DSMs from ACNs links precise, abstract logical design representations with tools and methods already developed around DSMs [Yassine 2004], and most notably to the theoretical work of Baldwin and Clark on the connections between design architecture and the economy. 3.3.2 Quantitative Analysis with Net Option Value. Sullivan et al. [Sullivan et al. 2001; Sullivan et al. 2005] and Lopes [Lopes and Bajracharya 2005], have previously used net option value analysis [Baldwin and Clark 2000] to quantitatively compare software designs modeled by DSMs. Baldwin and Clark’s theory is based on the idea that modularity provides a portfolio of options. They define a model for reasoning about the value added to a system by its modularity. Splitting a design into N modules increases its base value S0 by a fraction obtained by summing the net option values (NOV i ) of the resulting options. NOV is the expected payoff of exercising a search and substitute option optimally, accounting for both the benefits and cost of exercising options: V = S0 + NOV 1 + NOV i + ... + NOV m 1/2 NOV i = maxki {σi ni Q(ki ) − Ci (ni )ki − Zi } ACM Journal Name, Vol. V, No. N, Month 20YY.

10

· 1/2

For module i, σi ni Q(ki ) is the expected benefit to be gained by accepting the best positive-valued candidate generated by ki independent experiments. Ci (ni )ki is the cost to run ki experiments as a function Ci of the module complexity ni . Zi = Σjseesi cnj is the cost of changing the modules that depend on module i. The max picks the experiment that maximizes the gain for module i. Details of the NOV model can be found in the literature [Baldwin and Clark 2000; Sullivan et al. 2001; Lopes and Bajracharya 2005]. The most important parameters for NOV analysis are technical potential, σ, and complexity, n, and visibility cost, Z. Technical potential is the expected variance on the rate of return on an investment in producing a variant of a module implementation—that is, risk with commensurate rewards for success. On the assumption that the prevailing implementation of a module is adequate, the expected rate of return on investments is proportional to changes in requirements that drive the evolution of the module’s specification. Consequently, a module whose specification does not change has low technical potential. Complexity can be measured as the size of the artifact as a proportion of the overall system, using the number of design variables, lines of code, etc. The visibility cost measures the cost incurred by dependences between modules. We now make simplistic assumptions for the matrix example to compute its system NOV. We approximate sigma as the probability to change, and assume that the client will change for sure, having the highest sigma: 100%. From the matrix DSM, both data structure and algorithm depend on the client, and they have the same probability of change. The complexity only matters for design variables, and we assume that the data structure and algorithm share the system complexity equally. The client influences all the other variables, and has the highest visibility cost, 1. The algorithm and data structure depend on each other and each has 0.5 visibility cost. Figure 4 shows the NOV computation for the matrix example using Simon. The computed system NOV is 0, which means that the system is not flexible at all. If the dependence between ds and alg can be removed to make the system better modularized, that is, the visibility costs of ds and alg are 0, then the system NOV increases to 0.41, indicating more value from flexibility. In earlier work [Sullivan et al. 2001], we computed the NOVs for the two KWIC designs in Parnas’s seminal paper [Parnas 1972]. The NOV value of the more flexible information hiding design was 1.56, while the NOV value of the sequential design is only 0.26. A benefit of the mathematical model is that it supports rigorous sensitivity analysis. It remains an open challenge to justify precise estimates for real options in software design. As discussed in our earlier work, estimating technical potential is difficult [Sullivan et al. 2001]. However, as a back-of-the envelope model, it provides ballpark figures and useful insights. 3.3.3 Design Impact Analysis. Design impact analysis (DIA) helps the user to rigorously answer some of the most fundamental problems in design: given a design and a sequence of expected changes, what are the consequences? How many ways are there to accommodate them? Impact analysis for source code has long been studied [Arnold and Bohner 1996], but we lack the counterpart for abstract design models. Pioneers like Parnas have done such analyses, but in qualitative ways [Parnas 1972]. In this paper we present a first step to automating and quantifying such analysis for precise but abstract design models. ACM Journal Name, Vol. V, No. N, Month 20YY.

·

11

Fig. 4. Matrix NOV Computation

An answer to the question above can be formulated as a mapping from a DA, an assignment modeling the current design, and a sequence of variable-value pairs that model changes, to a set of sequences of consistent design states modeling the feasible evolution paths for the given sequence of changes. The solution is straightforward based on the DA: we find the paths that start from the initial design and go along the edges labeled with specified changes. Figure 5 shows the full DA of the matrix example. Evolution paths are along the arrows labeled with assignments. Figure 6 shows the input of the DIA analysis for the matrix example in Simon. The selected original design is {(matrix = dense), (ds = array ds), (alg = array alg)} (design 2), and the matrix is changed from dense to sparse (shown in the lower box). Figure 7 shows the DIA output of Simon. The upper box shows the evolution paths accommodating this change. The new designs accommodating this change could be either design 0, 4, or 5. The lower box shows currently selected new design, 5, and the middle box shows the differences between the new design, 5, and the old one, 2. 3.3.4 Summary. Abilities such as these have the potential to help with rigorous analysis of important and difficult decisions in software design. Associating costs with change, for example, would allow one to analyze the long-term cost-of-change implications of a choice among possible compensating actions today.

ACM Journal Name, Vol. V, No. N, Month 20YY.

·

12

Transition 0 1 2 3 4 5 6 7

Decision matrix = dense matrix = sparse ds = list_ds ds = array_ds ds = other_ds alg = list_alg alg = array_alg alg = other_alg

state 0 state 1 state 2 state 3 state 4 state 5

ds = other_ds ds = other_ds ds = list_ds ds = list_ds ds = array_ds ds = array_ds

matrix = sparse alg = other_alg matrix = dense alg = other_alg matrix = sparse alg = list_alg matrix = sparse alg = other_alg matrix = dense alg = other_alg matrix = dense alg = array_alg

Fig. 5. Matrix full DA generated by Simon

4.

CORE FORMALIZATION

To be precise, we have formalized the four core elements of our modeling and analysis approaches in Z [Spivey 2000]: constraint network, design automaton, coupling relation derivation and design impact analysis. Our definition follows Tsang’s widely cited, but somewhat informal definition of FDCN [Tsang 1993]. 4.1

Finite Domain Constraint Network

We use logical finite domain constraint networks (FDCNs) as the core of our framework. Using FDCNs, each dimension of a design space is represented by a variable. The domains of the variables model available choices in each dimension (which can include an unelaborated other). A decision is represented by binding a value to a variable. Constraints express dependences among decisions. A design is modeled as a valuation of variables that satisfies all of the constraints. A design space is modeled as the set of satisfying valuations. In Z, we abstractly specify variables and values as basic sets: [Variable, Value] The domains of variables are specified as a relation between variables and values. ACM Journal Name, Vol. V, No. N, Month 20YY.

·

13

Fig. 6. Matrix NOV Computation

Domains domain : Variable ↔ Value In order to specify constraint, we first specify the notion of assignment and valuation as below. Assignment Domains valueof : Variable → Value ∀ v : dom valueof • valueof (v) ∈ domain(| {v} |)

(4.1)

This schema formalizes that an assignment has a domain, and the function valueof maps variables to their values. Line 4.1 ensures that the value assigned to a variable must respect its domain. valuations : ConstraintNetwork → F Assignment ∀ cn : ConstraintNetwork • valuations(cn) = {asgn : Assignment | (dom asgn.valueof ) = cn.V}(4.2) ACM Journal Name, Vol. V, No. N, Month 20YY.

14

·

Fig. 7. Matrix NOV Computation

The function valuations maps a constraint network to a set of Assignment. Line 4.2 ensures that the domain of each assignment is the whole variable set of the constraint network. We call such an assignment valuation. In FDCN, a constraint is modeled as a set of permitted assignments to the variables to which it applies. Constraints constraints : F Variable ↔ F Assignment ∀ varset : F Variable; asgnset : (F Assignment) • (asgnset ∈ (constraints(| {varset} |))) ⇒ (∀ asgn : asgnset • (dom asgn.valueof = varset)) (4.3)

The function constraints maps a set of variables under constraint to a set of permitted assignments. Line 4.3 ensures that the domain of each assignment is the variable set. For the matrix example, the constraint ds = array ds ⇒ matrix = dense is modeled as the following set {{ds = array ds, matrix = dense}, {ds = other ds, matrix = dense}, {ds = other ds, matrix = sparse}}. A solution of a FDCN is a valuation satisfying all constraints. ACM Journal Name, Vol. V, No. N, Month 20YY.

·

15

solutions : ConstraintNetwork → F Assignment ∀ cn : ConstraintNetwork • solutions(cn) = {val : Assignment | val ∈ valuations(cn) ∧ (∀ asgnset : (ran cn.constraints) • (4.4) (∃ conasgn : asgnset • conasgn.valueof ⊆ val.valueof ))} (4.5) The function solutions maps an FDCN to a set of assignments. The constraint ensures that these assignments are solutions. Line 4.4 ensures that each assignment is a valuation. Line 4.5 ensures that for any constraint, there exists a permitted assignment that is the subset of (consistent with) the given valuation. ConstraintNetwork V : F Variable Domains Constraints dom domain = V dom constraints = F V

(4.6) (4.7)

This schema formalized the notion of FDCN as a triple (V, D, C). V is a finite set of variables; D, their domains; and C, a set of constraints on subsets of V. Line 4.6 ensures that D is defined over the FDCN variable set. Line 4.7 ensures that C constrains the FDCN variable set. 4.2 Design Automaton and its Derivation

The following schema specifies a Design Automaton (DA), which consists of a set of states, an alphabet that are a set of variable-value pairs, and a transition function. DA states : F Assignment alphabet : F(Variable × Value) transition : (Assignment × (Variable × Value)) → F Assignment dom(dom transition) = states; alphabet = (ran(dom transition)); (ran transition) = F states; As introduced in the previous section, a DA has two features: minimality and privileged. Minimality: Each transition between designs is minimal. That is, given a consistent state X : (v1 , ..., vi , ..., vn ) and a change in the value of variable i from vi to v0i , a minimal perturbation is a new consistent state with the given change present and possibly with additional changes, none of which can be undone while preserving consistency. In general, there are several ways to compensate for a change, so a DA is nondeterministic. Privileged: the cannot influence relation must not be violated. if (x, y) is in cannot influence, then among all the possible ways to restore consistency in the face of a change to x, those involving y are excluded. The function acn2da specifies the mapping from a ACN to a DA to ensure the two properties. It uses the following helper functions: ACM Journal Name, Vol. V, No. N, Month 20YY.

16

· variablesInBinding : (F(Variable × Value)) → F Variable ∀ binding : F(Variable × Value) • variablesInBinding(binding) = {var : Variable | (∃ abind : binding • var = firstabind)}

The function variablesInBinding extracts variables involved in an assignment. bindingDiff : (Assignment × Assignment) → F(Variable × Value) ∀ from, to : Assignment • bindingDiff (from, to) = {var : Variable; value : Value | (from.valueof (var) 6= to.valueof (var)) ∧ (value = to.valueof (var))} The function bindingDiff compares two assignments and returns the bindings in the second assignment that are different from those in the first assignment. replace : (Assignment × (F(Variable × Value))) → Assignment ∀ from, to : Assignment; bindingset : F(Variable × Value) • replace(from, bindingset) = to ⇔ (∀ binding : bindingset • to.valueof (firstbinding) = secondbinding) ∧ (∀ binding : from.domain \ bindingset • to.valueof (firstbinding) = from.valueof (firstbinding)) The function replace assigns different values to a set of variables in an assignment, and returns a new assignment. acn2da : (ConstraintNetwork × cannot influence) → DA ∀ cn : ConstraintNetwork; cif : cannot influence; da : DA • acn2da(cn, cif ) = da ⇒ (da.alphabet = cn.domain) ∧ da.states = solutions(cn) ∧ (4.8) (∀ start : valuations(cn); change : cn.domain; endstates : F Assignment • da.transition(start, change) = endstates ⇒ (∀ end : endstates • (change ∈ end.valueof ) ∧ (4.9) (∀ sub : F(Variable × Value) | sub ⊂ bindingDiff (start, end) • replace(start, sub) ∈ / solutions(cn)) ∧ (4.10) (∀ var : variablesInBinding(bindingDiff (start, end)) • var ∈ / (cif .cntinf (| {firstchange} |))))) (4.11) cannot influence cntinf : Variable ↔ Variable The function acn2da specifies the formal mapping from a CN and a cannot influence relation to a DA. Line 4.8 states that the states of the DA should be the solutions of the CN, and the DA alphabet should be the domain of CN. Line 4.9 states that a given change must be present in the destination state. Line 4.10 ensures the minimality, specifying that no subset of their differences would produce a consistent state. Line 4.11 ensures the privileged property. first change in this line gives the variable x that triggers the change. var is a changed variable. This line ensures that (x, var) is not in the cannot influence relation. 4.3

Pair-Wise Coupling Relation and its Derivation

We can now develop a formal definition of what it means for two design decisions to be coupled. The DA encodes complete coupling information. In general, pairwise coupling ACM Journal Name, Vol. V, No. N, Month 20YY.

·

17

relations summarize (lose) information in the DA. Yet pairwise coupling relations are useful in practice. Among other things, they’re what design structure matrices represent, and DSMs are demonstrably useful in practice. In a DA, each transition involves a set of variables for which changes in values are minimally sufficient to compensate for the given change from the given state. In general, there are several destination states, corresponding to the different ways to compensate that change. The mcsgroup summarizes these changed variables involved in all such the destination states. mcsgroup : (DA × Variable × Assignment × Value) → 7 (F(F Variable)) ∀ x : Variable; sol : Assignment; v : Value; da : DA • mcsgroup(da, x, sol, v) = {varset : F Variable | (4.12) (∃ end : da.transition(sol, (x, v)) • (varset = variablesInBinding(bindingDiff (sol, end))))}(4.13) Line 4.13 unions all the changed variables in all the end states. We further summarize all the mcsgroup for the same variable into the mcsset of that variable: mcsset : (DA × Variable) → 7 (F(F Variable)) ∀ x : Variable; da : DA • mcsset(da, x) = {amcs : F(F Variable) | (4.14) (∀ sol : da.states; v : da.alphabet(| {x} |) • amcs = mcsgroup(da, x, sol, v))} (4.15) S

Given a variable x, line 4.14 and 4.15 union the mcsgroups over all the transitions triggered by changing its values starting in any design state. Finally, we summarize all the mcsset of all the variables into a pairwise dependence relation: coupling : (ConstraintNetwork × cannot influence) → (Variable ↔ Variable) (∀ cn : ConstraintNetwork; cif : cannot influence; deps : Variable ↔ Variable • coupling(cn, cif ) = deps ⇔ ((dom deps ⊆ cn.V) ∧ (ran deps ⊆ cn.V) ∧ (4.16) (∀ pair : deps • (∃ mcs : mcsset(acn2da(cn, cif ), firstpair) • secondpair ∈ mcs (4.17))))) We specify coupling as a function that maps a DA and a connot influence relation to a binary coupling relation. Line 4.16 ensures that both the domain and range of this relation are within the CN variable set. Line 4.17 specifies that a pair (x, y) belongs to the coupling relation if and only if y is present in some element of the mcsset of x. 4.4

Design Impact Analysis

Given a current design, what are all the ways to compensate for a sequence of given decision changes? The answer to this question is necessary in developing methods, for example, to select the compensating state that reduces the long-term impact of change. We call it Design Impact Analysis (DIA). The answer to this question is to find all the paths in the DA that start from the initial state and go along the edges labeled with specified changes. The function impact formalizes ACM Journal Name, Vol. V, No. N, Month 20YY.

18

·

DIA. It maps a CN, a design, and a sequence to changes to a set of sequence of new designs. Each sequence of designs is an evolution path accommodating the sequence of changes. The last states of these paths are the new designs that the original one could reach. impact : (ConstraintNetwork × Assignment × (seq(Variable × Value))) → 7 (F(seq Assignment)) ∀ cn : ConstraintNetwork; nfa : NFA; design : Assignment; changes : seq(Variable × Value) • nfa = cn2nfa(cn) ∧ impact(cn, design, changes) = {solseq : seq Assignment | design = solseq(0) ∧ (4.18) (∀ n : 2 . . #solseq • solseq(n) ∈ nfa.transition(solseq(n − 1), changes(n)))} (4.19) Line 4.18 states that the start design is the first state in the evolution path. Line 4.19 states that each transition step consumes a change, leading to a set of new designs preserving that change. 4.5

Complexity

A basic question to be answered by a satisfactory coupling theory is, what is the computational complexity of the required analysis? We contribute a proof that the problem of computing the coupling structure from a finite domain constraint network (FDCN) is NP-complete (NPC). The proof is by reduction from FDCN satisfiability (FCSP), deciding whether a FDCN has a solution, which is NPC [Garey et al. 1979]. We modify a given FDCN into FDCN’ so that its dependence relation is non-empty if and only if the original FDCN is satisfiable. The idea is to add two variables, x, y with the same domain a, b, and a constraint that they have to take the same value: x = a ⇔ y = a. We call the modified FDCN as FDCN’. Now we prove that x and y depend on each other if and only if the original FDCN is satisfiable. It is easy to see that FDCN and FDCN’ have the same satisfiability. If FDCN is satisfiable, it should have at least one solution, Sol. For FDCN’, there are at least two solutions: Sol ∪ {x = a, y = a}, and Sol ∪ {x = a, y = b}. It is easy to tell that the coupling structure derived from FDCN’ will include x, y and y, x. On the other hand, we prove that if the coupling relation of FDCN’ is not empty, then FDCN is satisfiable. From the definition of mapping from ACN to a coupling structure, we know that, if FDCN is not satisfiable, its design space will be empty, so is the coupling structure. Now that we know it is not empty, FDCN’ must be satisfiable, so must FDCN. The conclusion is that reasoning about coupling (and so modularity) in logical design models is intractable.3 However, this is not to say that there are no useful algorithms, as other formal model applications. 5.

TOOL PROTOTYPE

We have implemented these techniques in Simon [Cai and Sullivan 2005]. It allows the user to build ACN models, derive DA representations, derive DSMs from DAs, compute NOV from DSMs, and execute DIA. Figure 8 illustrate the structure of Simon. 3 If the constraints in a notation as expressive as first-order logic with arithmetic, as might be used to model design spaces where resource consumption is critical, it is likely the problem is unsolvable, e.g., by reduction from the unsolvable problem of deciding whether a variable value is forced to zero [Borning 2004]. If true, there is no effective procedure for reasoning about coupling in logical models.

ACM Journal Name, Vol. V, No. N, Month 20YY.

·

19

Input GUI

Analysis GUI

Fig. 8.

Simon Structure

Simon supports interactive input and incremental ACN model building through three GUI interfaces: constraint network editor, pairwise cannot influence relation editor and clustering GUI. There three interfaces correspond the three parts of ACN. Figure 8 shows their snapshots in the input box. Constraint network editor takes the input of design dimensions as design variables, and possible choices in each dimension as variable domains. It also supports modeling of constraints among design decisions using a logical language. Users can add, delete, or change design variables, their domains and constraints over time. The pairwise cannot-influence relation editor takes matrix input to maximize flexibility. Clustering GUI supports multiple design clusterings, corresponding to the different ways people might wish to consider sets of design decisions collectively. Simon then translates an input ACN model into an Alloy [Jackson et al. 2001] specification and then invokes the Alloy constraint solver to compute the set of constraint-satisfying design states. After that it calls a program that we developed to derive the DA from the ACN, and the coupling structure from the DA. Based on the internal DA, Simon supports DSM, NOV and DIA through their respective interfaces, as shown in the analysis GUI box in Figure 8. For DSM analysis, Simon translates the coupling data structure into the DSM GUI. The rows and columns of the matrix are labeled with design variables organized by the selected ACM Journal Name, Vol. V, No. N, Month 20YY.

20

·

clustering method. The user can then view DSMs with identical coupling structures but where variables are grouped differently into modules. The NOV GUI allows the user to annotate derived modules with the parameters needed as inputs to Baldwin and Clark’s option pricing model for estimating the economic value of modularity; and it uses these annotations to compute overall modularity value for a design according to their approach. The DIA GUI allows the user to compose a design and specify a sequence of changes as a list of variable-value bindings. Given these inputs, Simon computes the branching set of paths of consistent design states. The final states of these paths represent the set of new designs accommodating the specified changes. Simon displays the differences between each new design and the original design to allow the user to compare the results and to make decisions accordingly. Using Simon, we evaluated our modeling and analysis techniques through a set of analyses and case studies. We present two representative ones: Parnas’s key word in context (KWIC) problem [Parnas 1972], and a web service application called WineryLocator studied by Lopes et al. [Lopes and Bajracharya 2005].

ACM Journal Name, Vol. V, No. N, Month 20YY.

· 6.

21

CASE STUDY 1: PARNAS’S KWIC

Parnas’s KWIC (Key Word in Context) index system is a well-known and well-established benchmark for assessing concepts in software design. A KWIC index system accepts an ordered set of lines; each line is an ordered set of words, and each word is an ordered set of characters. Any line may be ”circularly shifted” by repeatedly removing the first word and appending it at the end of the line. The KWIC index system outputs a listing of all circular shifts of all lines in alphabetical order. Parnas comparatively and informally analyzed two designs for KWIC. In the first, sequential design (SD), modules correspond to steps in the sequential transformation of inputs to outputs. In the second, information hiding (IH) design, modules decouple design decisions deemed complex or likely to change. The IH design uses abstract data type interfaces to decouple key design decisions involving data structure and algorithm choices so that they can be changed without unduly expensive ripple effects. Parnas presents a comparative analysis of the changeability of the two designs. He postulates changes and assesses how well each design can accommodate them, measured by the number of modules that have to be redesigned for each change. He finds that the information-hiding modularization is better. We modeled and studied these two designs using traditional DSMs [Sullivan et al. 2001]. The published quantitative analysis and manual DSMs provide suitable basis for explication and result comparison. Using Simon, our evaluation is given in the following steps: (1) Design Modeling: we develop an ACN model for each design of KIWC in order to see whether these ACNs have the expressive capacity to capture the key design issues and constraints in an abstract and informative way, and whether they are adequate to model the environment impacts. (2) DSM derivation: we derive DSMs from Simon for each design to answer the following questions: are the DSMs automatically generated consistent with the manual version in the previous work [Sullivan et al. 2001]? Do they similarly reveal the key observations made there? If there is inconsistency, which one is more faithful in revealing important dependence properties such as ripple effects? Where does the inconsistency come from? (3) NOV computation: given that there are inconsistencies between the manually constructed and the automatically derived DSMs, are the results we got from the manual DSMs still valid? Analyzed under different environment models, does the information hiding design still have higher value? Are the comparative quantitative results stable? (4) DIA analysis: we apply DIA analysis to quantify Parnas’s qualitative comparative analysis of the changeability of the two KWIC designs. We first formalize the five possible changes he postulates as decision problems, and then use Simon DIA interface to reveal the differences of the two designs with respect to each change to see whether the quantitative results are consistent with Parnas’s qualitative analysis? 6.1

Design Modeling

For the SD, Parnas describes five modules: Input, Circular Shift, Alphabetizing, Output, and Master Control. He views each interface as providing two parts: an exported data structure, and a function signature to be invoked by the Master Control module. Given choices ACM Journal Name, Vol. V, No. N, Month 20YY.

22

· DesignSpace kwic_seq{ 1 envr_input_format:{orig,other}; 2 envr_input_size:{small,medium,large}; 3 envr_core_size:{small,large}; 4 envr_alph_policy:{once,partial,search}; 5 input_sig:{orig,other}; 6 circ_sig:{orig,other}; 7 alph_sig:{orig,other}; 8 output_sig:{orig,other}; 9 master_sig:{orig,other}; 10 input_ds:{other,core4,disk,core0}; 11 circ_ds:{index,copy,other}; 12 alph_ds:{orig,other}; 13 output_ds:{orig,other}; 14 input_impl:{orig,other}; 15 circ_impl:{orig,other}; 16 alph_impl:{orig,other}; 17 output_impl:{orig,other}; 18 master_impl:{orig,other}; 19 input_impl = orig => input_sig = orig && input_ds = core4; 20 circ_impl = orig => circ_sig = orig && circ_ds = index; 21 alph_impl = orig => alph_sig = orig && alph_ds = orig && circ_ds = index; 22 output_impl = orig => output_sig = orig && output_ds = orig; 23 alph_ds = orig => circ_ds = index; 24 alph_impl = orig => input_ds = core4; 25 circ_impl = orig => input_ds = core4; 26 circ_ds = index => input_ds = core4; 27 circ_ds = copy => input_ds = core4; 28 output_impl = orig => input_ds = core4; 29 output_impl = orig => alph_ds = orig; 30 alph_ds = orig => input_ds = core4; 31 input_ds = core4 => envr_input_size = medium || envr_input_size = small; 32 input_ds = core0 => envr_input_size = small && envr_core_size = large; 33 input_ds = disk => envr_input_size = large; 34 circ_ds = copy => envr_input_size = small || envr_core_size = large; 35 input_impl = orig => envr_input_format = orig; 36 alph_impl = orig => envr_alph_policy = once; 37 master_impl = orig => input_sig = orig && circ_sig = orig && alph_sig = orig && output_sig = orig && master_sig = orig; 38 alph_ds = orig => envr_alph_policy = once; }

Fig. 9. SD Design Constraint Network

for these parameters, programmers produce function implementations. We modeled the choices of function signature, data structure, and implementation as design variables. As shown in Figure 9, variables ending with “ sig” model the function signatures. The choices of implementation are modeled by the variables ending with “ impl”. The choices of data structures are modeled by the variables ending with “ ds”. Parnas assumes original designs in each case and analyzes the impact of changes. We use value orig in most of the domains to model Parnas’s original choices. We extend each design variable domain with other to model the choice that is different than the original one but not yet elaborated. For example, the input fun sig has domain {orig, other}. We represent the relationships among these design decisions as logical constraints. In SD design, function implementations make assumptions about both the function signatures ACM Journal Name, Vol. V, No. N, Month 20YY.

·

23

DesignSpace kwic_ih{ envr_input_format:{orig,other}; 1 envr_input_size:{small,medium,large}; 2 envr_core_size:{small,large}; 3 envr_alph_policy:{once,partial,search}; 4 input_ADT:{orig,other}; 5 linestorage_ADT:{orig,other}; 6 circ_ADT:{orig,other}; 7 alph_ADT:{orig,other}; 8 output_ADT:{orig,other}; 9 10 master_ADT:{orig,other}; 11 linestorage_ds:{core0,core4,disk,other}; 12 circ_ds:{copy,index,other}; 13 alph_ds:{orig,other}; 14 output_ds:{orig,other}; 15 linestorage_impl:{orig,other}; 16 input_impl:{orig,other}; 17 circ_impl:{orig,other}; 18 alph_impl:{orig,other}; 19 output_impl:{orig,other}; 20 master_impl:{orig,other}; 21 linestorage_impl = orig => linestorage_ADT = orig && linestorage_ds = core4; 22 input_impl = orig => input_ADT = orig; 23 circ_impl = orig => circ_ADT = orig && circ_ds = index; 24 alph_impl = orig => alph_ADT = orig && alph_ds = orig; 25 output_impl = orig => output_ADT = orig && output_ds = orig; 26 master_impl = orig => master_ADT = orig && linestorage_ADT = orig && input_ADT = orig && circ_ADT = orig && alph_ADT = orig && output_ADT = orig; 27 alph_impl = orig => circ_ADT = orig; 28 circ_impl = orig => linestorage_ADT = orig; 29 input_impl = orig => linestorage_ADT = orig; 30 output_impl = orig =>alph_ADT = orig; 31 linestorage_ds = core4 => envr_input_size = medium || envr_input_size = small; 32 linestorage_ds = core0 => envr_input_size = small && envr_core_size = large; 33 linestorage_ds = disk => envr_input_size = large; 34 circ_ds = copy => envr_input_size = small || envr_core_size = large; 35 alph_ds = orig => envr_alph_policy = once; 36 input_impl = orig => envr_input_format = orig; 37 alph_impl = orig => envr_alph_policy = once; }

Fig. 10. IH Design Constraint Network

and relative data structures. For example, the circular shift function implementation has to know the circular shift function signature and how the circular shift data is arranged in core. According to Parnas, this function operates data through index in the original design. We model this choice as a value of circ ds, index. Line 20 in Figure 9 model the constraint. To implement this function, it also has to know the data structure of the Input module. In the current design, the characters are packed four to a word, which is modeled as a value, core4 of variables input ds. The constraint is modeled in Figure 9 line 25. Figure 10 shows the constraint network for IH design. A new module, Line Storage, is present. Its data structure variable linestorage ds replaces the input ds of the sequential design. The IH input module has no separate data structure. In the IH design, each module is also equipped with an abstract data type interface, modeled by variables ending with ACM Journal Name, Vol. V, No. N, Month 20YY.

24

·

“ ADT”. Module implementations and data structures are modeled in the same way. According to the paper, a module only knows the ADTs of other modules. For example, the circular shift implementation now assumes the linestorage ADT, but not the data structure any more, as shown in Figure 10 line 28: Parnas presents a comparative analysis of the changeability of the two designs based on their ability to accommodate the changes in the following environment conditions: the input format, input size, core size and alphabetizing policy. We model these environment conditions as the environment variables. In both designs, the variables starting with “envr ” are environment variables. The environment conditions also have domains modeling different possibilities. For example, the domain of envr core size is {small, large}, and the domain of envr alph policy is {once,partial,search}. Data structure and implementation usually make assumptions about the relative environment condition. Line 34 and 35 in Figure 9 are two examples. In the SD design, Parnas noted, “All of the interfaces between the four modules must be specified before work could begin...” These are the choices of function signatures and data structures that dominate other design variables. Consequently, the SD cannot influence relation includes pairs like: (input fun impl, input fun sig), (input fun impl, input ds), etc. Similarly, in the IH case, the choices of ADT interface definitions dominate other decisions, and pairs like (linestorage ds, linestorage ADT) are thus set in the IH cannot influence relation. The SD interfaces and data structures and the IH ADT interfaces serve as design rules. In both designs, we assume that the environmental conditions are out of the control of the designers. Accordingly, (linestorage ds, envr input size), (linestorage ds, envr core size), etc., are included in the cannot influence relations of each design. There are multiple ways to cluster a design. Figure 11 shows the Simon clustering GUI supporting different views of the same KWIC design. For purposes such as task assignment, we want to group all variables involved in a particular function into a single module. For example, we could group the envr alph policy, alph ADT, alph ds and alph impl into a module, as shown in Figure 11 (b). In our earlier work [Sullivan et al. 2001], we observed that for a design to be truly an information hiding modularization, the design rules should be invariant under changes in environment variables. To evaluate these two designs against this criterion, we want to cluster the environment parameters, design rules and subordinate variables respectively into protomodules. In this case, for example, we group the envr alph policy, envr input size, envr core size, and envr input format into an environment module, as shown in Figure 11 (c). So far, we have positive answers for the design modeling questions: the modeling approach is expressive enough to capture the key design issues and constraints, as well as model environment impacts. 6.2

DSM Derivation

After the DA and the coupling relation are generated by clicking the ”Solve” menu item in Simon, the user is able to view the generated DSMs and analyze design impacts. We compare the DSMs that Simon generates from our KWIC ACN models with manual results we presented in previous work [Sullivan et al. 2001]. We generated DSMs through Simon ACM Journal Name, Vol. V, No. N, Month 20YY.

·

25

(b) Task Assignment View

(a) No Clustering (c) Design Rule View

Fig. 11.

Simon clustering GUI for IH Design

using the clustering method seen in Figure 11 (c). To ease the comparison, we copied and pasted the DSM generated from Simon into Excel and marked the differences from the manual DSMs. In the DSMs shown below, all the cells with dark backgrounds and white foregrounds represent the discrepancies between derived and manual DSMs. A blank dark cell means that there was a erroneous mark in the manual version. A cell with an ”x” in it means that the dependence was missed in the manual version. Figure 12 and 13 present the SD and IH DSMs generated by Simon and marked in Excel. In each DSM, variables 1–4 are environment variables. The next run of variables are the design rule variables. The final run models the remaining open design choices. By comparison, we are able to answer the validation questions for DSM derivation. First of all, our computed DSMs were largely consistent with the earlier results, validating the modeling and analysis concept. They reveal exactly the same key observations: the loadbearing walls of an information hiding design (the design rules) should be invariant with respect to changes in the environment and that such changes should be accommodated by changes to hidden (subordinate) design variables. There are differences, however, which we now address. First, the computed DSMs reveal subtle errors in our manually produced DSMs, supporting our intuition that logic modeling and automated analysis are more reliable than manual modeling and analysis. In ACM Journal Name, Vol. V, No. N, Month 20YY.

26

· 1

2

3

4

5

6

7

8

9

10

11

12

2:envr_input_size

.

x

3:envr_core_size

x

. .

4:envr_alph_policy

.

5:input_fun_sig

.

6:circ_fun_sig

.

7:alph_fun_sig

.

8:output_fun_sig

.

9:master_fun_sig 10:input_ds

x

x

.

x

x

11:circ_ds

x

x

x

.

x

12:alph_ds

x

x

x

.

x

.

13:output_ds x

14:input_fun_impl

x

x

15:circ_fun_impl

x

16:alph_fun_impl

x

x

17:output_fun_impl

x

x

x x x x

x

18:master_fun_impl

Fig. 12.

1

x

x

x

3

2:envr_input_size

.

x

3:envr_core_size

x

.

4

x

x

x

.

x

x

x

. x

.

x

.

5

6

7

8

9

10 11 12 13 14 15 16 17 18 19 20

. .

5:line_storage_adt

.

6:input_adt

.

7:circ_adt

.

8:alph_adt

.

9:output_adt

.

10:master_adt 11:line_storage_ds

x

12:line_storage_impl

x

14:circ_ds

x

.

4:envr_alph_policy

13:input_impl

.

x

KWIC SD Derived DSM

2

1:envr_input_format

13 14 15 16 17 18

.

1:envr_input_format

x

. x x

x

x x

x .

x

.

x

. x x

15:circ_impl 16:alph_ds

x

17:alph_impl

x

x

x . . x

x

x

x . . x

18:output_format 19:output_impl x

20:master_impl

Fig. 13.

x

x

x

x

x

x

x . x

.

KWIC IH Derived DSM

our computed information hiding (IH) DSM, cells (17, 7) and (19, 8) revealed dependences missing from our manual model. It also lacks several dependences that should not have been present in the manual version. An extra variable, input ds, which is redundant with linestorage ds, was removed. Finally, the environment variables core size and input size are also now shown as dependent, in that a change in one can be compensated for by a change to the other. The second class of differences between Simon’s output and our manual calculation consists of important ripple effects in the computed DSMs that are not shown in the manual version. For example, our manual structured design (SD) DSM had no dependence between output fun impl and circ ds. The derived DSM revealed this dependence owing ACM Journal Name, Vol. V, No. N, Month 20YY.

·

27

to two constraints in its ACN model: output fun impl = orig ⇒ alph ds = orig alph ds = orig ⇒ circ ds = index Parnas’s paper confirms the presence of this dependence and the correctness of the formal model and derived DSM. Even in such a small example, manual DSM is error-prone. Automated tool support is critical for correct modeling and analysis of complex design constraint networks. 6.3

NOV Computation

In our previous paper [Sullivan et al. 2001], we computed the NOV values for the manually constructed DSMs. As we have shown that the automatically derived DSMs are different from the manual versions. we redo the experiment to see if the results are consistent with our previous work. In that paper, our NOV analysis was based on the following assumption and used the following notions that we continue to use in the new Simon experiments: —N is the number of design parameters in a given design. —Given a module of p parameters, its complexity is n = p/N. —The value of one experiment on an unmodularized design, σ ∗ N 1/2 Q(1) = 1, is the value of the original system. —The design cost c = 1/N of each design parameter is the same, and the cost to redesign the whole system is cN = 1. —The visibility cost of module i of size n is Zi = Σjseesi cn. —One experiment on an unmodularized system breaks even: σ ∗ N 1/2 Q(1)CcN = 0. Baldwin and Clark make the break-even assumption for an example in their book [Baldwin and Clark 2000]. For a given system size, it implies a choice of technical potential for an unmodularized design: in our case, σ = 2.5. We take this as the maximum technical potential of any module in a modularized version. This assumption for the unmodularized KWIC is a modeling assumption, not a precisely justified estimate. In practice, a designer would generally have to justify the choices of parameter values more convincingly. We have observed that the environment is what determines whether variants on a design are likely to have added value. If there is little added value to be gained by replacing a module in a given environment, no matter how complex it is, that means the module has low technical potential. We chose to estimate the technical potential of each module as the system technical potential scaled by the fraction of the environment variables relevant to the module. We further scaled the technical potential of the modules in the SD design by 0.5, for two reasons. First, about half of the interactions of the environment variables with the SD design are with the design rules (but their visibility makes the cost to change them prohibitive). Secondłand more of a judgment callłthe hidden modules in this design (algorithms) are tightly constrained by the design rules (data structures that are assumed not to change). There would appear to be little to be gained by varying the algorithm implementations, alone. The bottom line we got was that the system NOV was 0.26 for the SD design but 1.56 for the IH design. These numbers are percentages of the value of the protomodularized ACM Journal Name, Vol. V, No. N, Month 20YY.

28

·

system, which has base value 1. Thus the value of the system with the information-hiding design was predicted to be 2.6 times that of the system with the unmodularized design; and the SD’s, worth but 1.26 times as much. Our model suggested that the IH version of the system was twice as valuable as the SD. Ignoring the base value and focusing just on modularity, the model predicted that the IH design provides 6 times more value in the form of modularity than the SD design. In Simon, we can repeat the result exactly by first clustering the DSMs as the manual ones, and assign these modules the same parameters as we did before. Figure 14 and Figure 15 are the Simon snapshot repeating the previous experiments. The right upper tables in Figure 14 and 15 show our assumptions about the technical potential, complexity and visibility cost of the modules in the SD and IH designs.

Fig. 14.

NOV Computation for Manual KWIC SD

Since the derived DSMs that Simon works on are quite different from the manual ones, this repetition assumes: (1) We put the new design under the environment parameters we used for the manual DSMs; (2) the coupling relations among hidden modules are the same; (3) each module has the same parameters. We now analyze if these assumptions are still valid in the newly derived DSMs. First, in the previous work [Sullivan et al. 2001], we hypothesized and categorized the possible forces driving changes that Parnas might have selected, and appear to be implied in his analysis into three environment variables: computer configuration (e.g., device capacity, speed); corpus properties (input size, language—e.g., Japanese); and user profile (e.g., computer savvy or not, interactive or offline), as shown in Figures 8, 9, and 10 of the paper [Sullivan et al. 2001], for the SD, proto-modular, and IH designs, respectively. The environment variables we used in the ACN modeling are the explicit translation of Parnas’s ACM Journal Name, Vol. V, No. N, Month 20YY.

·

Fig. 15.

29

NOV Computation for Manual KWIC IH

prose mentioning possible changes literally. Both environments are valid models, and the first assumption is valid. Second, as we can see from the DSMs in Figure 12 and 13, apart from the environment section, the main differences between the manual DSMs and derived DSMs concentrate on the dependences between design rules and hidden modules, and there are no dependences among hidden modules in both manual and derived DSMs. So the second assumption is valid. The third assumption is problematic though: (1) In both ACN models, we separate the interfaces of MasterConstrol from its implementation, which influence the complexity count. For example, the derived SD DSM now have one more design variable than the manual SD DSM. (2) in the manual IH DSM, the input ds was redundant with linestorage ds, and is removed in the derived IH DSM shown in Figure 13. This difference influences the complexity and technical potential estimations of the Input module in the derived IH DSM: the complexity changes from 0.125 to 0.0625, and technical potential lowers from 2.5 to 1.6. We show the updated NOV computations for both designs in Figure 16 and 17. Comparing the new NOV computation in Figure 16 with the old one in Figure 14, and Figure 17 with Figure 15, each pair shares the same technical potentials (assuming the same environments), and have different complexity estimations. According to the derived DSMs, the system NOV is now 0.29 for the SD design and 1.30 for the IH design. Still, focusing just on modularity, the model predicts that the IH design provides 4.5 times more value in the form of modularity than the SD design. Our comparative result is still valid. Now that we have modeled both designs under the different set of environment variables, we would like to evaluate whether the comparison result still holds under the new environment setting. We then reestimate the technical potentials for the each derived DSM as shown in Figure 18 with the old one in 19. The new technical potential estimation ACM Journal Name, Vol. V, No. N, Month 20YY.

30

·

Fig. 16.

NOV Computation for Manual KWIC SD

Fig. 17.

NOV Computation for Manual KWIC IH

follows the old way: the maximum is 2.5, and the technical potential of each module is estimated as the maximum scaled by the fraction of the environment variables relevant to the module. ACM Journal Name, Vol. V, No. N, Month 20YY.

·

Fig. 18.

NOV Computation for Derived KWIC SD

Fig. 19.

NOV Computation for Derived KWIC IH

31

Comparing the new NOV computation in Figure 18 with the one in Figure 16, and Figure 17 with Figure 15, each pair shares the same complexity estimations, but differs in technical potentials. According to the derived DSMs, the system NOV is now 0.28 for the ACM Journal Name, Vol. V, No. N, Month 20YY.

32

·

SD design and 1.36 for the IH design. Still, focusing just on modularity, the model predicts that the IH design provides 4.9 times more value in the form of modularity than the SD design. Our comparative result is still valid. These analyses basically confirm the results we got from the manual DSMs: although we constructed environment parameters differently and independently, we got the same comparative quantitative results: the IH design provides far more value than the SD design does. In addition, based on the same, more accurate, derived DSMs, the results are more closer under different environments: for the SD NOV, we got 0.28 and 0.29 respectively, and for the IH design, 1.30 and 1.36. 6.4

Design Impact Analysis

Parnas comparatively analyzed these two designs by considering the following possible changes, which we model using our framework: (1) The input format changes. It implies that there could be other input format choice other than the current one. Accordingly, we model the domain of envr input format as {orig, new}. In the original design, envr input format = orig. The change is modeled as envr input format = new. (2) The input size becomes so large that not all lines can be put in core. We model this change as envr input size = large. (3) The input size gets so small that a word could be unpacked, modeled as envr input size = small. These two changes implies that the input size of the original design is medium, modeled as envr input size = medium, and the domain of envr input size is {small, medium, large}. (4) The alphabetizing policy is changed to partial or search, modeled by envr alph policy = partial and envr alph policy=search. In the original design envr alph policy = once. Parnas’s informal comparative analysis can be formulated as follows: given an original design, and given changes in environment (input size, core size, etc.), what are the feasible new designs that accommodate the given changes, and how many modules have to change to get to these new design states? The number of modules that have to change is obviously a simple proxy for cost, but for now we are satisfied to live with it. In future work we expect to explore richer cost-of-change models. Figure 20 and Figure 21 are the snapshots of the KWIC SD DIA input and output GUI, in which the effects of changing input size from medium to large is analyzed. We organized all the changes and their impacts on both designs computed by Simon into Figure 22. The numbers in the circles represent the design states of the DAs. The double circles are the start states. It shows part of the SD and IH DAs with states S18 and S1034 as the respective start states. Transitions are labeled with changes shown in the table below. The tables associated with the end states show what other variables are changed in the destination states. For example, in the SD DA, changing the input size to large (the transition labeled C2 ) leads state S18 to state S555 or S865 . In both of them, seven other variables are changed to compensate for the driving change. The numbers in the last two columns of the lower table represent the number of other variables that are affected by the changes in each design. The results confirm in a fully formal way that the IH design space involves fewer redesign requirements. For example, ACM Journal Name, Vol. V, No. N, Month 20YY.

·

33

Fig. 20. Tool Snapshot: KWIC SD Design Impact Analysis Input

Fig. 21. Tool Snapshot: KWIC SD Design Impact Analysis Output

when the input size gets large, in SD design, 7 dimensions has to be touched, while for IH design, only 2. ACM Journal Name, Vol. V, No. N, Month 20YY.

34

·

C1 C2 C3 C4 C5

Change envr_input_format = new envr_input_size = large envr_input_size = small envr_alph_policy = partial envr_alph_policy = search

SD 1 7 0 3 3

IH 1 2 0 2 2

Fig. 22. Partial Non-deterministic Finite Automaton for SD and IH design

So far, we have quantitatively confirmed Parnas’s qualitative analysis. By associating each variable with an economic value, we should be able to further estimate the economic cost of each evolution step, if requested.

ACM Journal Name, Vol. V, No. N, Month 20YY.

· 7.

35

CASE STUDY 2: WINERYLOCATOR—A WEB APPLICATION

In their paper [Lopes and Bajracharya 2005], Lopes et al. studied a web application called WineryLocator. The authors used DSMs to model and compare object-oriented and aspectoriented designs for WineryLocator. Their purpose, analogous to that of Sullivan et al., was to model the value of modularity [Sullivan et al. 2001]; in this case, with a focus on the benefits of aspect-oriented modularity. WineryLocator is designed to locate wineries in California. A user can input either an approximate or exact address, by which the application locates the exact address as the valid staring point. After that, the user can select preferences for the wineries. Given a starting point and the preferences, the application generates a route for a tour consisting of all the wineries that match the preferences. The application outputs a set of stops in the route, a navigable map, and the driving directions under user request. We replicated their experiment to test the hypothesis that our framework can express and analyze the system designs they studied. We formulated the five designs they presented informally as ACNs, and derived DSMs. 7.1

Design Modeling

We built our models from their semantic description of the system. The names and the number of variables in our DSMs are somewhat different from theirs. For example, we separated interface and implementation variables. In order to make their manual and our derived DSMs comparable, we model the design dimensions in their manual DSMs in our ACN modeling. Also, we use the same order and follow their naming convention as much as possible to make the correspondences clear and the DSMs comparable. According to the authors, there are three types of the main application functions: the functions the user uses directly (user functions), the service functions used by the user functions, and the security and auditing functions (system functions). For each function dimension, following a typical OO design, we separate it into implementation and interface dimensions. To build WineryLocator ACNs, we first model these dimensions as variables: (1) User functions: —Find an exact location according to user input, modeled as startWineryFind sig and startWineryFind impl. —Search preferred wineries, modeled as searchWinery sig and searchWinery impl. —Present all the stops constituting a tour, modeled as tour sig and tour impl. —Generate driving direction, modeled as directions sig and directions impl. (2) Service functions: —Request and get exact address, modeled as AddressLocator sig and AddressLocator impl. —Request and get preferred wineries, modeled as WineryFinder sig and WineryFinder impl. —Request and get routes, modeled as RouteMapHandler sig and RouteMapHandler impl. (3) System functions: —Authenticate address requests, modeled as AuthAddressLocator sig and AuthAddressLocator impl. ACM Journal Name, Vol. V, No. N, Month 20YY.

36

· —Authenticate route requests, modeled as AuthRouteMapHandler sig and AuthRouteMapHandler impl. —Log all service calls, modeled as WebServiceLogger sig and WebServiceLogger impl.

Their system uses several third-party services, like MapPoint, Apache AXIS, etc. The authors modeled them as environment parameters. If we were them, we’d like to model the environment conditions that are likely to change as environment parameters. For example, the user interface could be either web-based or GUI application based on Java Swing. They mentioned this as a possible change in the paper, but didn’t model it. However, in order to make the two DSMs comparable, we model these services and their APIs as environment variables as they do. These services are MapPoint, ApacheAXIS,Servlet, HttpSessionBindingListener, MapPoint Design Rules. Other design dimensions include a local service and its API developed by the authors: WineryFind and WineryFindDesignRules. This service is used to get wineries matching criteria. There is also a configuration file participated called web xml, which is an XML file storing user name, password, URL’s, etc. Given all these basic dimensions, they also introduced five interfaces to decouple the effects of MapPoint as much as possible. Following how they name these interfaces, we model them as the following design variables: —startAddress Address: the starting location the user provides and selects. —matches Address: the data structure storing the set of matched addresses. —WinerySearchOption: the data structure storing the preferences. —Tour: tour representation. —MapOptions: standard map options. For all these dimensions represented by design variables, we reasonably assume that each dimension has at least two choices, the current choice, and some other unelaborated choice. So, their domains are {orig, other}. The constraints among these variables are: —Implementations assume interfaces. —User functions assume service function interfaces and other user function interfaces. —Service functions assume that relative third-party services are available. —System functions are used by other functions. —Other constraints, like where the web.xml file is used. For its cannot influence modeling, we assume that interfaces are design rules; environment variables are out of control of other variables; the system functions, like authentication and logging, shouldn’t affect the service function implementation. For clustering, we follow the order the authors gave. For a function like tour, we put tour sig and tour impl together as a tour module. So far, we have the whole ACN for the WineryLocator OO design. The author presented an AOP design where logging and authentication functions are implemented using aspects. Modeling the AOP design is similar to that of OO design. The aspects are also design dimensions that can be modeled by design variables: Aspect Logging and Aspect Authentication. ACM Journal Name, Vol. V, No. N, Month 20YY.

·

37

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 0:MapPoint 1:ApacheAXIS 2:WineryFind 3:Servlet 4:HttpSessionBindingListener 5:MapPointDesignRules 6:WineryFindDesignRules 7:startAddress_Address 8:matchesAddress 9:WinerySearchOption 10:Tour 11:MapOperation 12:WebServicesLogger_sig 13:WebServicesLogger 14:AddressLocator_sig 15:AddressLocator 16:AuthAddressLocator_sig 17:AuthAddressLocator_impl 18:WineryFinder_sig 19:WineryFinder 20:RouteMapHandler_sig 21:RouteMapHandler 22:AuthRouteMapHandler_sig 23:AuthRouteMapHandler_impl 24:startWineryFind_sig 25:startWineryFind_impl 26:searchWinery_sig 27:searchWinery_impl 28:tour_sig 29:tour_impl 30:directions_sig 31:directions_impl 32:web_xml

.

. . . x . x

. x x

. . . . . . . x

x

x x

x x

x

x x x

x x

x x

. x x

x x x

.

x

. . x

x . x

x x

.

x x

x

x

x x

x

x

x x

. . x

.

x

x x x x x

x

x

x . x

x x

x

Fig. 23.

. . x

x

x x

x

x x

.

x . x

.

x . x

.

x . x

. .

Derived WineryLocator Design Rule DSMs

The constraints in the AOP design change a little. These aspect variables now assume the implementations of the functions. The functions no longer need to know the interfaces of these system functions. In essence, the modeling of AOP and OO designs are the same. In both ACN models, logging and authentications are just design variables. The two designs differ in how the variables are constrained. We observe that ACNs are expressive enough to uniformly capture design issues such as third-party services, interfaces, implementations, project configuration, and aspects. 7.2

DSM Derivation

Figure 23 shows the DSM for the DR design generated from Simon. Figure 24 shows a collapsed DSM for comparison with their manual DSMs. In the collapsed DSM, the interface and implementation of each function are aggregated together into one design dimension. Figure 25 shows the AOP design DSM generated and clustered by Simon. By comparing the right DSM in Figure 0?? with figure 5 in Lopes’s work [Lopes and Bajracharya 2005], we found that they are largely consistent. However, the comparison reveals discrepancies. The cells with black background are the marks that are missing in their manual DSMs. Tracing down the differences exposed several interesting issues, showing the advantages of our formal model and automatic tool. Some of the issues are ACM Journal Name, Vol. V, No. N, Month 20YY.

38

·

0:MapPoint 1:WineryFind 2:ApacheAXIS 3:Servlet 4:HttpSessionBindingListener 5:MapPointDesignRules 6:WineryFindDesignRules 7:startAddress_Address 8:matchesAddress 9:WinerySearchOption 10:Tour 11:MapOperation 12:WebServicesLogger 13:AddressLocator 14:AuthAddressLocator 15:WineryFinder 16:RouteMapHandler 17:AuthRouteMapHandler 18:startWineryFind 19:searchWinery 20:tour 21:directions 22:web_xml

0 1 2 3 4 5 6 7 8 9 10 11 12 . . . . x . x . x x . . . . . . . x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x

Fig. 24.

13 14 15 16 17 18 19 20 21 22

. x .

x . . x .

x x

x

. x x x . x x x x . x x . .

Collapsed WineryLocator Design Rule Design

unique to software design dependence modeling. First, our derived DSMs reveal many indirect dependences not shown in their manual ones. For example, they chose MapPoint as their major library, which influences many other decisions. However, in their DSMs, only one module depends on it. Although a higher order matrix might reveal indirect dependences, these indirect dependences are not accounted for in Baldwin and Clark’s value model, which they are using [Baldwin and Clark 2000]. By contrast, the derived DSMs yield defensible estimates of the total impact of given local changes in design. Second, the dependence definition in the manual DSM modeling is ambiguous, making the manual DSMs hard to understand. We take three design dimensions, startWineryFind, AddressLocator and AuthAddressLocator for example. The first is a function making use of the service provided by the second to locate addresses. The third inherits from the second and extends it with authentication functions. While our derived DSMs show that the first depends on the other two, their manual DSMs indicate only a dependence of startWineryFind on AuthAddressLocator, but not on AddressLocator, despite the fact that AddressLocator interface changes affect the startWineryFind function directly. We understand the reason for this discrepancy after discussing it with the authors to learn exactly how the system is implemented. The dependence between startWineryFind and AuthAddressLocator is because of usage: the former is a jsp page using the latter as ACM Journal Name, Vol. V, No. N, Month 20YY.

· 0:MapPoint 1:WineryFind 2:ApacheAXIS 3:Servlet 4:HttpSessionBindingListener 5:MapPointDesignRules 6:WineryFindDesignRules 7:startAddress_Address 8:matchesAddress 9:WinerySearchOption 10:Tour 11:MapOperation 12:AddressLocator_interface 13:AddressLocator_impl 14:WineryFinder_interface 15:WineryFinder_impl 16:RouteMapHandler_interface 17:RouteMapHandler_impl 18:startWineryFind_interface 19:startWineryFind_impl 20:searchWinery_interface 21:searchWinery_impl 22:tour_interface 23:tour_impl 24:directions_interface 25:directions_impl 26:WebServicesLogger_impl 27:aspect_Logging 28:aspect_Authentication

39

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 . . . . x . x x . x x . . . . . . . x x x x x x . . x x x x x x . . x x x x x x . . x x x x x . x . x x x x x . x . x x x x x x . x . x x x x . . x x x x x x . x x x x x .

Fig. 25.

WineryLocator Aspect Oriented Design

a Javabean. Since startWineryFind doesn’t refer to AddressLocator directly, the authors did not mark them as dependent. The usage and inherits relations are different, so using transitive closure operations to find this missing dependence doesn’t seem to be proper. By contrast, our framework provides an exact semantics of dependence: a change in one design decision causes revisitation and revision of the other. Using this definition, the missing dependences are discovered directly. 7.3

NOV Computation

In their paper, Lopes et al. applied the NOV analysis, as we did in our previous paper [Sullivan et al. 2001], to compare the differences between the values of the different designs. We have seen that the DSMs generated by Simon are different from the DSMs they constructed manually. Since more ripple effects and interdependences are detected, we recompute the NOV values in Simon to evaluate the differences, estimating the NOV parameters in the same way they used. Figure 26 and 27 are the Simon snapshots for the new NOV computations based on the DSMs shown in Figure 24 and 25 respectively. In their paper, the NOV values for DR design and AO design are 1.59 and 1.76 respectively, indicating that the AOP design improved the flexibility of the DR design by 11%. Based on the new derived DSMs, the NOV values for these two designs are 0.55 and 0.87, indicating an improvement of 58%. ACM Journal Name, Vol. V, No. N, Month 20YY.

40

·

Since the scales of the technical potential parameters are different due to their estimation method, only the comparative results matter.

Fig. 26.

NOV Computation for Derived WinerySearch DR Design

Fig. 27.

NOV Computation for Derived WinerySearch AO Design

ACM Journal Name, Vol. V, No. N, Month 20YY.

·

41

The result shows that the aspect modules add more values to the system than they had computed. The aspect logging module, now having the highest technical potential, contributes most to the increment. Its high technical potential comes from the fact that it can accommodate the changes coming from 5 external parameters. Their modeling approach is thus telling the following story: logging, as a subsidiary part of the system, its flexibility adds a lot of value to the system, owe to its ability to accommodate external libraries. We have a problem here. In our previous work [Sullivan et al. 2001], we extended traditional DSMs with environment variables to model the external forces that drive changes and bring values. The value of modularity comes from the ability to accommodate these changes by substituting hidden modules and seeking for higher values. In their work, they model third party libraries and APIs, such as MapPoint services, Apache AXIS and Java Servlet APIs as environment parameters. Although it is possible that these services might change because of upgrading, these changes are less likely to be the evolution driver of the WinerySearch application. On the other hand, although they explicitly mentioned the dimensions that might change, such as the possibility of switching from web-based to a GUI application based on Java Swing, these dimensions are not part of their environment models. They also recognized and implied in their paper that the end user requirement could drive changes and bring values: ”we assume that sigma to be depend on the modules relevance to the end users of the system. A module that an end user directly interacts with or, benefit from, is likely to add more value than a module that is hidden from or irrelevant to the users. However, the end user requirements were not modeled as environment parameters. Our framework not only provides a mechanized way to derive DSMs and compute NOVs, but also embodies a design modeling methodology: identifying and modeling both design dimensions and possible evolution and value drivers. 8.

OUR WORK AND INFORMAL MODULARITY THEORIES

Our work provides a basis for finally formalizing the information hiding criteria of Parnas. In an information hiding design, design rules must be invariant with respect to anticipated changes in environment variables: such changes should be accommodated by changes to hidden variables [Sullivan et al. 2001]. The framework we develop in this paper gives us the vocabulary we need to formalize this idea precisely. We could formalize it as a predicate stating that a coupling relation derived from an augmented constraint network should not have any pair with a first element in an environment variable cluster, and the second in a design rule cluster. In the KWIC examples, the IH design is an information hiding design. There are no computed dependences between environment variables and ADT interfaces, and environment changes are accommodated by changes to hidden variables. The SD design does not meet this criterion. With this formalization, it is possible to rigorously assess whether a particular form of modularity is achieved. Baldwin and Clark’s informal design rule (DR) and modularity theory is based on the idea that modularity adds value in the form of real options. A module creates an option that has economic value. A design rule is a stable design decision, such as a mutually agreeable interface definition or contract, that serves to decouple—modularize—otherwise dependent design decisions, and to increase the option value of a design and design process. Our framework formalizes core concepts in Baldwin and Clark’s theory. The notions of design dimension, design decision and design decision dependence are formally captured ACM Journal Name, Vol. V, No. N, Month 20YY.

42

·

by variables, bindings of values to variables, and constraints. Their ordering and clustering of variables into proto-modules is captured in our cluster concept. And the asymmetries in design decision making crucial to their concept of design rules are captured in our dominance relation. Our DAs embody their design spaces. Our DA transitions express a new concept. Our analysis led us to see two interesting potential oversights in Baldwin and Clark’s option pricing model. Very briefly, their model suggests that the options value of a module depends on the benefits that can be obtained by investing in a search (R&D) for improved versions, offset by the cost of switching to a new ones. Their model of switching cost accounts only for disruptions to design decisions that directly depend on the one being changed. It should account for the full impact of all ripples. Secondly, DSMs are not expressive enough to capture the possibility of accommodating a change in several different ways. Baldwin and Clark’s approach, which is based on DSMs, thus assumes that there is essentially only one way, and one cost, to accommodate a give change. In other words, DSMs don’t capture the kind of non-determinism that our DAs express. 9. 9.1

RELATED WORK Constraint Network

Using constraint networks to represent designs is an ancient idea in AI [Freuder 1989; Mackworth 1977]. Our work differs from traditional AI work in terms of both the purpose and approach. While their purpose is to find optimal designs, ours is dependence analysis. Our ACN modeling is different because of the additional dominance relation and clustering structure. Our DA differs from AI constraint graphs in that we consider more than dependences inferred from constraint syntax. 9.2

Design Coupling

The idea that coupling structure is a key to adaptability and value in design is old [Alexander 1970; Baldwin and Clark 2000; Ashby 1952; Simon 1996; Stevens et al. 1974]. Using CNs in design is also familiar, although it’s not well developed in software engineering. To our knowledge, a coupling theory for logical design spaces as a basis for a formal account of modularity in design is new. 9.3

Software Evolution, Modularization

Parnas’s seminal paper [Parnas 1972] informally grounded the theory of modularization and evolution. After that, Belady and Lehman [Lehman 1985] studied the OS/360 evolution process, and established a structural degradation model. Baldwin and Clark [Baldwin and Clark 2000] expressed the OS/360 evolution procedure using DSMs, making explicit how modularization generates value according to their theory of design rules. 9.4

Impact Analysis

Traditional impact analysis research focuses on change issues at program level, as summarized in [Arnold and Bohner 1996]. Advantages of our approach include a precise semantics of dependence, and the ability to reason about the ripple effects of changes in high-level design decisions. We have provided a precise notion of impact analysis for logical design models. ACM Journal Name, Vol. V, No. N, Month 20YY.

· 9.5

43

Design Space Modeling

Batory [Batory and O’Malley 1992] uses formal models of software design spaces for systems that vary in component implementations. His work aims to support system generation and reuse. Jackson [Jackson 2002] used Alloy for object modeling with the goal of being able automatically to prove properties of given models. Garlan et al. [Garlan and Notkin 1991; Abowd et al. 1995] used Z to formalize architectural styles in order to prove mainly behavioral properties of systems in these styles. Other researchers are exploring the use of CNs in design space search and optimization for complex embedded system design. The goal is to find good designs under constraints (e.g., [Mohanty et al. 2002]). Our aim and contribution, by contrast, is a logical theory of coupling in logical design space models. Design space modeling has also been studied by Bosch [Sinnema et al. 2004], Lane [Lane 1990] and Feather [Feather 2001] for product line design, design generation and optimization. Our purpose and approach differ. We seek to validate and explore the ideas of Baldwin and Clark for advances in software engineering research, in the direction of accounting for the connections between design structure and value. In addition, our formal approach is based on the combination of logical and non-logical elements to analyze design coupling structures and change impacts. 9.6

Architectural Description Language

Our work is related to work on software architecture [Abowd et al. 1995] [Allen and Garlan 1997] [Medvidovic and Taylor 1997]. Most such work is committed to an ontology of components and connectors. Logical variables and constraints are more general and expressive. Our models do capture a notion of architecture in the sense of design decisions (especially design rules) on which much depends. Stafford and Wolf [Stafford and Wolf 2001] studied architectural dependence analysis for architecture definition languages. Our work, in contrast, is not confined to an architectural level, and is entirely formal. 9.7

Consistency Checking

Consistency management has been recognized as an important issue. Resent work of Nentwich et al. [Christian Nentwich and Finkelstein 2003] uses XML to express and check the consistency among distributed design documents, such as Z specifications and UML. They also use first order logic as the semantics under XML expressions, check consistency and pinpoint the places that violate the constraints. Their work differs from ours both in terms of expressiveness and analysis ability. Their work is based on standard design documents, such as UML, and is limited by them, not expressing higher level abstract design decisions. As to the analysis results, they only give the parts that violate current constraints, but not their ripple effects, that is, if the inconsistent parts get changed, what other parts could be affected. Also, there are usually multiple ways to restore consistency. Their work doesn’t address the non-determinism either. 9.8

Design Structure Matrix

DSM modeling has been widely studied and used in other engineering realms, and is supported by tools such as DeMAID [James L 1996]. MacCormack et al. [MacCormack et al. ] and Sangal et al. [Sangal et al. 2005] are studying DSM modeling of software systems and their architectures based on source code analysis. Lattix [lattix ] has produced a commercially available tool for this purpose. The DSM is central to Baldwin and Clark’s account ACM Journal Name, Vol. V, No. N, Month 20YY.

44

·

of modularity in design [Baldwin and Clark 2000]. Its significance for software design was first explored in depth by Sullivan et al. [Sullivan et al. 2001; Sullivan et al. 2005] and subsequently by Lopes et al. [Lopes and Bajracharya 2005], MacCormack [MacCormack et al. ], and Sangal et al. [Sangal et al. 2005]. We provide DSMs with rigorous semantics. 10.

CONCLUSION

As a first step to bridging the gap between software design and economic analysis, we are developing a formal design space modeling and analysis approach, and a supporting tool called Simon. A set of experiments has produced some evidence that our new theory might shed light on poorly understood connections between software design structures and economic value. Eventually such analysis might help the practicing designer to make design decisions on economically defensible grounds, even perhaps to make decisions leading to economically better results. REFERENCES

A BOWD , G. D., A LLEN , R., AND G ARLAN , D. 1995. Formalizing style to understand descriptions of software architecture. ACM Transactions on Software Engineering and Methodology 4, 4 (Oct.), 319–64. A LEXANDER , C. W. 1970. Notes on the Synthesis of Form. Harvard University Press. A LLEN , R. AND G ARLAN , D. 1997. A formal basis for architectural connection. ACM Transactions on Software Engineering and Methodology 6, 3 (July), 213–49. A RNOLD , R. AND B OHNER , S. 1996. Software Change Impact Analysis, First ed. Wiley-IEEE Computer Society Pr. A SHBY, W. 1952. Design for a Brain. John Wiley and Sons. BALDWIN , C. Y. AND C LARK , K. B. 2000. Design Rules, Vol. 1: The Power of Modularity. The MIT Press. BATORY, D. AND O’M ALLEY, S. 1992. The design and implementation of hierarchical software systems with reusable components. ACM Transactions on Software Engineering and Methodology 1, 4, 355–398. B ORNING , A. 2004. Personal communication with Kevin Sullivan. B ROOKS , F. 1987. No silver bullet: Essence and accidents of software engineering. IEEE Computer 20, 4 (Apr.), 10–19. C AI , Y. AND S ULLIVAN , K. 2005. Simon: A tool for logical design space modeling and analysis. In 20th IEEE/ACM International Conference on Automated Software Engineering. Long Beach, California, USA. C HRISTIAN N ENTWICH , W. E. AND F INKELSTEIN , A. 2003. Flexible consistency checking. D IJKSTRA , E. W. 1982. On the role of scientific thought. In Selected Writings on Computing: A Personal Perspective. Springer-Verlag, 60–66. E PPINGER , S. D. 1991. Model-based approaches to managing concurrent engineering. Journal of Engineering Design 2, 4, 283–290. F EATHER , M. S. 2001. Risk reduction using ddp (defect detection and prevention): Software support and software applications. In RE. 288. F REUDER , E. C. 1989. Partial Constraint Satisfaction. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, IJCAI-89, Detroit, Michigan, USA. 278–283. G AREY, M. R., J OHNSON , D. S., G AREY, M. R., AND J OHNSON , D. S. 1979. Computers and Intractability: A Guide to the Theory of Np-Completeness. W.H. Freeman & Company. G ARLAN , D. AND N OTKIN , D. 1991. Formalizing design spaces: Implicit invocation mechanisms. In Proceedings of the 4th International Symposium of VDM Europe on Formal Software Development-Volume I. Springer-Verlag, 31–44. JACKSON , D. 2002. Alloy: a lightweight object modelling notation. ACM Trans. Softw. Eng. Methodol. 11, 2, 256–290. JACKSON , D., S HLYAKHTER , I., AND S RIDHARAN , M. 2001. A micromodularity mechanism. In Proceedings of the 8th European software engineering conference held jointly with 9th ACM SIGSOFT international symposium on Foundations of software engineering. ACM Press, 62–73. JAMES L, R. 1996. Demaid/ga - an enhanced design manager’s aid for intelligent decomposition. In Proceedings of 6th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization. Seattle, WA. ACM Journal Name, Vol. V, No. N, Month 20YY.

·

45

L ANE , T. G. 1990. Studying software architecture through design spaces and rules. Tech. Rep. CMU/SEI-90TR-18, CMU. lattix. A commercial product. http://www.lattix.com/. L EHMAN , M. M. 1985. Program Evolution: Processes of Software Change. Academic Press, London, UK, Chapter 12, 247–274. L OPES , C. V. AND BAJRACHARYA , S. K. 2005. An analysis of modularity in aspect oriented design. In AOSD ’05. ACM Press, New York, NY, USA, 15–26. M AC C ORMACK , A., RUSNAK , J., AND BALDWIN , C. Exploring the structure of complex software designs: An empirical study of open source and proprietary code. Harvard Business School Working Paper Number 05-016.. M ACKWORTH , A. 1977. Consistency in networks of relations. In Artificial Intelligence, 8. 99–118. M EDVIDOVIC , N. AND TAYLOR , R. N. 1997. A framework for classifying and comparing architecture description languages. SIGSOFT Software Engineering Notes 22, 6 (Nov.), 60–76. M OHANTY, S., P RASANNA , V. K., N EEMA , S., AND DAVIS , J. 2002. Rapid design space exploration of heterogeneous embedded systems using symbolic search and multi-granular simulation. SIGPLAN Not. 37, 7, 18–27. PARNAS , D. L. 1972. On the criteria to be used in decomposing systems into modules. Communications of the ACM 15, 12 (Dec.), 1053–8. S ANGAL , N., J ORDAN , E., S INHA , V., AND JACKSON , D. 2005. Using dependency models to manage complex software architecture. In OOPLSA. S IMON , H. A. 1996. The Sciences of the Artificial, Third ed. The MIT Press. S INNEMA , M., D EELSTRA , S., N IJHUIS , J., AND B OSCH , J. 2004. Covamof: A framework for modeling variability in software product families. In Proceedings of SPLC 2004. Vol. 3154. 197–213. S PIVEY, M. 2000. The fuzz manual. URL: http://spivey.oriel.ox.ac.uk/˜mike/fuzz/. S TAFFORD , J. A. AND W OLF, A. L. 2001. Architecture-level dependence analysis for software systems. International Journal of Software Engineering and Knowledge Engineering 11, 4, 431–451. S TEVENS , W. P., M YERS , G. J., AND C ONSTANTINE , L. L. 1974. Structured design. IBM Systems Journal 13, 2, 115–39. S TEWARD , D. V. 1981. The design structure system: A method for managing the design of complex systems. IEEE Transactions on Engineering Management 28, 3, 71–84. S ULLIVAN , K., C AI , Y., H ALLEN , B., AND G RISWOLD , W. G. 2001. The structure and value of modularity in software design. SIGSOFT Software Engineering Notes 26, 5 (Sept.), 99–108. S ULLIVAN , K., G RISWOLD , W., S ONG , Y., AND ET AL ., Y. C. 2005. Information hiding interfaces for aspectoriented design. In ESEC/FSE ’05. T SANG , E. 1993. Foundations of Constraint Satisfaction. Academic Pr., London and San Diego. YASSINE , A. A. 2004. An introduction to modeling and analyzing complex product development processes using the design structure matrix (dsm) method.

ACM Journal Name, Vol. V, No. N, Month 20YY.

Suggest Documents