methodology for building a repository of design fragments is presented that consists of core ..... Two clustering mechanisms are developed: 1) top- down, exploiting keywords, and 2) bottom-up, exploiting patterns for clustering. 4.1 Top-Down ...
A Methodology for Building a Repository of ObjectOriented Design Fragments Tae-Dong Han, Sandeep Purao, and Veda C. Storey J. Mack Robinson College of Business Administration, Georgia State University, P.O. Box 4015, Atlanta, Georgia 30302-4015 {than,spurao,vstorey}@gsu.edu
Abstract. Reuse is as an important approach to conceptual object-oriented design. A number of reusable artifacts and methodologies to use these artifacts have been developed that require the designer to select a certain level of granularity and a certain paradigm. This makes retrieval and application of these artifacts difficult and prevents the simultaneous reuse of artifacts at different levels of granularity. A specific kind of artifact, analysis pattern, spans these levels of granularity. Patterns, which represent groups of objects, facilitate further assembly into what we call design fragments. Design fragments can then be used as reusable artifacts in their own right. A methodology for building a repository of design fragments is presented that consists of core and variant design fragments. The effectiveness of the methodology is assessed by verifying the appropriateness of the design fragments generated through a clustering process.
1. Introduction Reuse involves the design of new systems from prefabricated reusable artifacts [7]. Although a number of reusable artifacts at various granularity levels have been created, they have been less effective than desired. Class libraries and business objects, for example, are difficult to reuse since finding a component that fits into a specific application can be problematic. On the other hand, as the abstraction and granularity of an artifact increases, as found in domain and reference models, a greater degree of modification is required, either to the model or the business process itself. One kind of reusable artifact, analysis patterns [2, 3], represents an opportunity to overcome these problems. Although patterns themselves are fairly small and at a high level of abstraction, they can be used and synthesized to create a larger-grained, specific reusable artifact, which may, in turn, be supplemented with analysis patterns. Our prior research presented an approach to reuse-based design with analysis patterns [8, 9, 10]. The results suggested that analysis patterns can present many challenges to the designers, such as searching for appropriate patterns, instantiating them in the problem domain, and synthesizing them to create an initial design. In this research, we address these challenges by creating a larger and more specific artifact, that we call a design fragment. The analysis patterns provide us with a means of structuring these design fragments in an indexed repository so that they can be easily accessed in new design situations. The objective of this research, therefore, is to: J. Akoka et al. (Eds.): ER’99, LNCS 1728, pp. 203-217, 1999. c Springer-Verlag Heidelberg Berlin 1999
204
Tae-Dong Han et al.
develop a methodology for building a repository of reusable design fragments from previously developed object-oriented conceptual designs. We develop a mechanism that creates these design fragments from prior designs and indexes and stores them, so that an automated search can be performed to find the design fragments and possible extensions appropriate for new system requirements. The contribution of the research is to define design fragments, and present an approach to create, categorize, and index them in a repository that will greatly reduce search and modification costs for systems development. This paper is divided into six sections. Related research is described in section 2. Sections 3 and 4 outline our methodology and clustering algorithms. The results are demonstrated in section 5 by application to three new design situations. Section 6 concludes the paper.
2. Related Research 2.1 Analysis Patterns An analysis pattern1 is a group of generic objects, defined in a domain-neutral manner, which can be applied, by analogy, in different domains [2]. Each generic object, such as Actor or Transaction, has stereotypical properties and responsibilities. Libraries of analysis patterns have been created for reuse [2, 3]. For example, the pattern Participant – Transaction can be used in different domains to create instantiations, such as ‘cashier – sale’, ‘officer – incident’ or ‘clerk – reservation.’ Figure 1 shows the analysis pattern Participant – Transaction.
Participant number startDate endDate password authorizationLevel aboutMe howMany howMuch calcOverTransactions rankTransactions inAuthorized calcForMe
Transaction
n 1
number date time status aboutMe calcForMe rateMe
Participant-Transaction Group-Member Actor-Participant Container-Content Place-Transaction Plan-Step ----
Fig. 1. An Analysis Pattern
2.2 Creating Synthesized Designs with Reuse of Analysis Patterns Assembling systems with reuse of analysis patterns involves the retrieval, instantiation, and synthesis of patterns in the problem domain. In previous work [8, 9, 10], we have proposed a reuse-based design process to facilitate these tasks. As shown in Figure 2, the naïve approach to synthesizing analysis patterns into a design begins with a simple natural language assertion such as: “a system to track sales at different stores.” Using natural language, instead of creating a formal specification language, frees designers from the burden of learning and commanding a new 1
Patterns [1] have been identified at the conceptual design stage (analysis patterns [2, 3]) as well as the detailed design stage (design patterns [4]).
A Methodology for Building a Repository of Object-Oriented Design Fragments
205
language. These natural language assertions are parsed, Identification of Significant Words retaining words such as ‘sale.’ Appropriate objects are identified Identification of Objects such as ‘Transaction’ (for sale). Relevant Retrieval of Relevant Patterns patterns are retrieved, such as Place – Instantiation of Relevant Patterns Transaction. These patterns are Synthesis of Instantiated Patterns instantiated, using keywords identified in the problem assertion, for Patterns Library Synthesized Designs example, 'store – sale' (for the pattern Place – Transaction). Fig. 2. Prior Research leading to Synthesized Designs Finally, the instantiated patterns are synthesized using overlapping objects. The process can iterate with designer interaction. We have augmented this naïve approach with learning mechanisms that exploit the Usage History and the structure of patterns (e.g. the participation of an object in multiple patterns), to improve the process and provide greater assistance to the designer. Figure 2 shows this approach. The synthesized designs created from these approaches represent the precursor to the methodology described in this paper. Each design contains multiple analysis patterns instantiated and synthesized to address a specific set of requirements. They form the key inputs to this research. We use clustering to create a new type of reusable artifacts, design fragments. Augmented with Learning Mechanisms
Naïve Approach
2.3 Clustering as the Mechanism for Constructing Design Fragments Clustering classifies a set of objects into sets that share common features. It involves unsupervised, inductive learning in which “natural classes” are found from elements of a set [5]. Clustering is based on close associations or shared characteristics, that is, a similarity measure. Clusters represent collections of elements whose intra-cluster A B similarity is high and inter-cluster similarity is o x o a o o y b low. In statistical clustering, this similarity can be decided by a numerical distance. However, numerical clustering cannot take into account the meaning of the elements or Fig. 3. An Example of Conceptual the clusters. Clustering can be conceptual Clustering using semantic and syntactic knowledge about the elements. For example, in Figure 3, the points a and y are grouped into the one cluster based on numerical methods. However, with knowledge of the circle, it is
206
Tae-Dong Han et al.
more appropriate to cluster x and y into group A, and a and b into group B [6]. We use mechanisms analogous to these to construct the design fragments.
3. Building Design Fragments This section presents the methodology for constructing reusable design fragments as identified by clustering algorithms. The clustering is based on: similarities in requirements statements and similarities in the composition and ontological classifications of elements in the synthesized designs. 3.1 Design Fragments A design fragment is a specific instantiation and arrangement of analysis patterns that addresses a specific set of requirements. It is a partial design, that is, a group of patterns that are instantiated and synthesized to address a specific set of requirements. Figure 6 shows sample design fragments. A design fragment embodies higher granularity than an analysis pattern (Figure 4). It provides the same level of specificity as a domain model since it is a partial design.
Specificity Design Fragment
Analysis Pattern
Domain Model
Granularity
Fig. 4. Design Fragment
3.2 Solution Architecture Figure 5 shows our approach to building the design fragments. The area enclosed by the dotted line indicates our prior research, which generates conceptual designs by synthesizing analysis patterns. This research focuses on creating and classifying design fragments from designs assembled using our prior research. The classification proceeds as an application of successive clustering algorithms that exploit the: (a) problem description, (b) design graph, and (c) ontological classification of pattern instantiations. The resulting groups of instantiated patterns represent different partial designs that we call design fragments. These are indexed and stored in a repository. The methodology is applied to designs “within” a domain since the objective is to create design fragments that are more “specific” than analysis patterns.
A Methodology for Building a Repository of Object-Oriented Design Fragments
Designer
Problem Description
Purao &Storey 97a, 97b
207
3.3 A Methodology for Building Design Fragments
This section illustrates the methodology by demonstrating the creation and indexing of Pattern design fragments from ten Pattern Synthesis designs in the human resource Library Engine management domain. Figures 6 and 7 show (a) several designs in this domain generated by Conceptual Clustering synthesizing analysis patterns, Design Algorithms and (b) important keywords from the initial problem description documents Keywordidentified using natural Based Clustering language parsing techniques. Design Although these ten designs are Common Fragments Pattern-Based similar, only three (1, 7, and 8) Repository Clustering are identical. Designers often Ontological need to create different designs Classificationfor similar applications as Based Clustering dictated by their specific requirements. Each design Fig. 5. Solution Architecture represents instantiations and synthesis of a different set of patterns. Design 1 (Figure 6), for example, represents a synthesis of six patterns: Actor – Participant (company – employee), Participant – Transaction (employee – agreement), Place – Transaction (department – agreement), Transaction – Subsequent Transaction (agreement – subsequent agreement), Transaction – Transaction Line Item (agreement – Design 1 agreement line item), and Transaction employee company – Specific Item (agreement – specific item). The complete set of ten designs (from Figures 6 and 7) represent the department agreement input to our example. specific item These ten designs undergo subsequent successive clustering algorithms to agreement agreement identify, index, and cluster the design line item fragments. The output is a repository that stores classified design fragments Important keywords: company, based on three criteria: important employee, agreement, specific item keywords, common pattern sets, and the ontological classification of Fig. 6. A Graph Theoretic instantiations. Representation of a Design
208
Tae-Dong Han et al.
Design 2
Design 3 employee
company company
employee
performance
skill
agreement step
agreement
performance execution
specific item subsequent agreement agreement line item
company
hierarchical skill item
knowledge
subsequent agreement agreement line item
keywords: company, employee, agreement, agreement line item Design 4
skill line item item
item knowledge hierarchical line item knowledge item
keywords: agreement, specific item, agreement line item, performance
Design 5
Design 6
department
employee
Design 7 specific item
company
employee
employee department employee contract keywords: company, employee, agreement line item
contract contract line item keywords: employee, agreement, specific Item, agreement line item
Design 8 company department subsequent agreement
subsequent agreement
subsequent agreement
agreement line item
keywords: employee, agreement, specific item, performance
line item keywords: employee, agreement, specific item, step
specific item agreement line item
keywords: employee, agreement line item, step
Design 10
company
agreement
employee agreement
subsequent agreement
specific item
agreement
Design 9 employee
agreement
agreement
employee
contract
specific item agreement line item
keywords: company, performance, step
contract line item keywords: employee, specific item, performance, step
Fig. 7. Several Designs assembled from Analysis Patterns
3.3.1 Creating Keyword-Based Clusters The designs are classified using keywords (k ∈ K) in the requirements statements. Each set of keywords, that is, a unique requirement, is denoted as Ri and defined by important keywords; Ri(k1, k2, ...). We denote each cluster as a unique set of requirements (R1, R2, etc). Each design is classified into one or more clusters
A Methodology for Building a Repository of Object-Oriented Design Fragments
209
depending upon the elements (keywords) that make up its requirements. The clustering algorithm interprets the relative frequencies of keyword pairs as measures of similarity. Table 1 shows the sample designs placed into nine clusters. A given design may be classified into one/more clusters or may even form a cluster of one. Design 1 (D1) is classified into cluster R1 along with designs 5, 6, and 7, and into cluster R5 with design 2. Design 9 forms a cluster of one. Table 1. Design clusters based on keywords
Keyword-based clusters R1(employee, agreement, specific item) R2(company, employee, agreement line item) R3(employee, agreement, agreement line item) R4(employee, agreement line item) R5(company, employee, agreement) R6(employee, specific item, step) R7(agreement, specific item, agreement line item) R8(agreement, specific item, performance) R9(employee, specific item, performance) R10 – Remainder
Designs in the cluster D1, D5, D6, D7 D2, D4 D2, D5 D4, D8 D1, D2 D7, D10 D3, D5 D3, D6 D6, D10 D9
3.3.2 Creating Common Pattern Set-Based Clusters The designs are also clustered based on commonalties in patterns used. A Common Pattern Set (CPSn) is the maximal set of patterns shared among the designs. For example, if some designs share only the patterns Actor – Participant – Transaction, that is, the composition of two analysis patterns, Actor – Participant and Participant – Transaction, this represents the CPS for those designs. CPS’s represent invariant occurrences of design fragments among different designs. They allow clustering of designs and identification of first-cut design fragments. Table 2 shows that five CPS’s (that is, design fragments) are identified from ten sample designs. Design 1 (D1) is clustered into the CPS5 group with designs 7 and 8. Figure 8 shows the graphical representation of each CPS. The clusters represent cumulative design fragments, that is, the relationships between clusters represent aggregation. It is possible that the clustering process may diverge into multiple branches, leading to design fragments that do not obey a linear aggregation hierarchy. Table 2. CPS-based Clusters
CPS-based design fragments CPS1: Participant – Transaction CPS2: Participant – Transaction, Transaction – Transaction Line Item CPS3: Participant – Transaction, Transaction – Subsequent Transaction, Transaction – Transaction Line Item, Transaction – Specific Item CPS4: Actor – Participant, Participant – Transaction, Transaction – Subsequent Transaction, Transaction – Transaction Line Item,
Designs in the cluster D4 D5, D10 D6 D2, D3, D9
210
Tae-Dong Han et al.
Transaction – Specific Item CPS5: Actor – Participant, Participant – Transaction, Place – Transaction, Transaction – Subsequent Transaction, Transaction – Transaction Line Item, Transaction – Specific Item
Actor
Participant
CPS5
Transaction
Specific Item
Subsequent Transaction
Transaction Line Item
Place
CPS4
Actor
Participant Transaction
CPS4 Specific Item
D1, D7, D8
Although the CPS’s are cumulative, the membership of designs in the clusters indicated by these CPS’s is not. This is because a CPS is designated as the maximal set of common patterns. For instance, CPS2 is fully contained in CPS3; that is, design 6 (D6) also contains CPS2. However, D6 is a member of the cluster defined by CPS3, but not CPS2. The clustering algorithm works by identifying the maximal set of common patterns at each iteration.
3.3.3 Refining the Clusters using Ontological Classification of Instantiations Transaction CPS3 Subsequent The final result requires one more level of Transaction Line Item classification (not demonstrated here due to space limitations). Each CPS is refined CPS3 according to how each object in the patterns Participant CPS2 is instantiated. One way to understand the semantics of an object is to map it to an Specific Item Transaction ontology. We adopt the ontology of Storey et al. [11, 12] and store ontological Transaction Subsequent classifications for each object in a pattern. For Line Item Transaction example, instantiations of Participant such as ‘agent’, ‘applicant’, ‘cashier’, or ‘clerk’ are ontologically categorized as person. CPS2 CPS1 Participant Instantiations such as ‘manufacturer’, ‘supplier’, or ‘shipper’ have an ontological Transaction Transaction classification of tradable social structure. Line Item Similarly, Transaction can be instantiated into two ontological types, event and activity. Thus, for a pattern Participant – Transaction, CPS1 Participant four ontological groups exist: person – event, person – activity, tradable social structure – Transaction event, and tradable social structure – activity. The number of types of ontological relationships for different CPS’s varies Fig. 8. Common Pattern Sets according to how many ontological types are available for each pattern within the CPS. More precisely, a CPS is presented as CPSnj, where j indicates the type of the ontological group. In CPS5, for example, both Actor and Participant can be instantiated as person (i.e., person, agent) or social structure (i.e., company, supplier). Transaction can be an event (i.e., withdrawal), or
A Methodology for Building a Repository of Object-Oriented Design Fragments
211
an activity (i.e., delivery). Place can be a tradable stationary place (i.e., bank), a nontradable stationary place (i.e., region), or a non-stationary place (i.e., shelf). Thus, there may be numerous instantiations of CPS5, and may need refinement accordingly.
4. Clustering Algorithms The clustering mechanisms derive and index the design fragments using similarity measures in: (a) requirements, as captured by keywords, and (b) the patterns, their instantiations, and arrangements. Two clustering mechanisms are developed: 1) topdown, exploiting keywords, and 2) bottom-up, exploiting patterns for clustering. 4.1 Top-Down Algorithm Step1: Identify Keywords. The requirements statements for each prior design can be parsed to identify candidates for keywords. From these, a subset is identified as the ‘keywords’ for the domain. The candidate words (ki), with a higher relative frequency (kfi), represent the keywords in the domain. The relative frequency is the number of occurrences of a keyword (nki) in all problem statements (nd) i.e. keyword relative frequency: kfi = nki / nd. A threshold may be used to determine the important keywords in the domain. For our example, we identify keywords from the ten examples: k1: company, k2: employee, k3: agreement, k4: specific item, k5: agreement line item, k6: performance, k7: step. Step 2: Define each Requirements Statements as a Set of Keywords. Each design is represented as a set of important keywords, Dn[{ki}], for example, D50[k1, k3, k5, k9]. The designs are shown in Table 3. (See Figures 6 and 7.) Table 3. Important keywords identified in each design
D1[k1,k2,k3,k4] D6[k2,k3,k4,k6]
D2[k1,k2,k3,k5] D7[k2,k3,k4,k7]
D3[k3,k4,k5,k6] D8[k2,k5,k7]
D4[k1,k2,k5] D9[k1,k6,k7]
D5[k2,k3,k4,k5] D10[k2,k4,k6,k7]
Step 3: Compute Similarity Measure. The Table 4. Keyword Cohesion similarity measure that clusters the designs is k1 k2 k3 k4 k5 k6 keyword cohesion, which indicates the degree of 3 2 1 2 1 common keywords among requirements k1 5 5 4 2 statements. However, computing cohesion of all k2 5 3 2 possible combinations of keywords is an NP- k3 2 3 complete problem. Thus, keyword cohesion, k4 kcij, is computed only between every pair of k5 1 keywords and exploited during clustering. The k6 computation consists of kcij = nkikj / nd where k7 nkikj is the number of occurrences of keywords ki and kj together in the requirements statements, and nd is the number of designs. keyword cohesion table is shown in Table 4.
k7 1 3 1 2 1 2 The
212
Tae-Dong Han et al.
Step 4: Cluster Designs using the Similarity Measures. The clustering process begins with keyword pairs that appear most often in documents. [k2, k3], [k2, k4], and [k3, k4], have the highest cohesion rate, kc23 = kc24 = kc34 = 5/10 = 50%. Their union [k2, k3, k4] is first checked for coexistence and coverage in the requirements statements (Table 3). The requirements statements for four designs (D1, D5, D6, and D7) share all three keywords. These keywords cover 75% of the keywords in each design. We interpret this as these four documents share a common ‘concept,’ denoted as Ci (same as Ri in Table 1). A threshold may be used to ensure that the current ‘concept’ covers a significant fraction of the requirements statement. For this example, we use a threshold of 60%2. Since the threshold is met by all designs, they are placed in a cluster. The next highest cohesion rate is kc25 (4/10 = 40%). The requirements statements for four designs (D2, D4, D5, and D8) share these keywords [k2, k5]. However, they cover only 50%, 67%, 50%, and 67% (respectively) of the requirements. Since the threshold (60%) is not met for all designs, the next step is to find a common set of keywords (a ‘concept’) in all possible combinations of these designs, that satisfies the threshold. As a result, three sets of keywords are identified that are shared by different combinations of designs. [k1, k2, k5] are shared by two designs (D2 and D4) at 75% coverage level respectively. [k2, k3, k5] are also shared by two designs (D2 and D5) at 75% coverage level. And [k2, k5] are shared by D4 and D8 at 67% coverage level. Thus, D2 and D4 are clustered into C2(k1, k2, k5). D2 and D5 are clustered into C3(k2, k3, k5), and D4 and D8 into C4(k2, k5). By repeating this process for each next successive cohesion level in Table 4, nine concepts are identified (Table 1). This algorithm is: The Top-Down Algorithm 1. Find pairs of keywords with the next highest cohesion frequency. 2. Find the union of keyword from these pairs. 3. Identify all req’t statements where keywords in this keyword union exist, ‘n’. 4. Compute req’t_coverage, i.e. ‘keywords in keyword union’ / ‘keywords in req’t statements’ for all these designs. 5. Cluster all designs only when req’t_coverage >= threshold is true for all designs. If step 5 cannot be satisfied then continue to step 6, otherwise return to step 1. 6. Find maximal subset of keywords in all possible combination of (n-1) requirements statements. 7. Compute req’t_coverage for each combination of designs. 8. Cluster all designs only when req’t_coverage >= threshold is true for each combination of designs. If any more designs remain, set n to n-1 and return to step 6, else return to step 1, until n-1 >= 2.
4.2 Bottom-up Algorithm The bottom-up clustering algorithm is composed of two steps; first, identifying common pattern sets (CPS’s) as features that cluster designs, and second, further discriminating each cluster of designs (based on CPS) with its ontological meaning. 2
When none or few designs meet the designer-determined threshold, it means that few requirements statements share common concepts, that is, should be clustered separately.
A Methodology for Building a Repository of Object-Oriented Design Fragments
213
Step 1: Identify Common Pattern Sets. Finding CPS’s among designs begins with a seed pattern set (Ps); that is, the most frequently used pattern(s) in all designs. Continuing the example, pattern Participant – Transaction represents the seed pattern. The designs are divided into two groups – one that includes this pattern and another that does not. Here, all ten designs include the seed pattern. Within each group of designs, the next most frequently shared pattern set is identified, and the grouping repeated. The Transaction – Transaction Line Item pattern meets this criterion in the group that has the seed pattern. It is shared by nine designs (all except design 4 (D4)). The process continues until there are no more shared patterns. Application of the algorithm is shown in Figure 9. The algorithm can be used with a threshold (e.g. most common pattern must be shared by >50% in the group). This algorithm is: The Bottom-up Algorithm 1. Find the next seed pattern(s) (within threshold) in the current set of designs. 2. Name this set of seed pattern(s) as common pattern set (CPSi) 3. Create a subset of designs where CPSi ⊆Dk and another subset where CPSi ⊄ Dk 4. Redefine the subset with CPSi ⊆ Dk as the current set of designs. 5. Repeat 1 thorough 4 until no further CPS is found. The result of this process is the membership of different designs in CPS’s. For example, designs D1, D7, and D8 share CPS5, whereas designs D2, D3, and D9 share CPS4. Table 2 shows the correspondence between CPS’s and different designs. {Dk} | CPS5 ⊆ Dk {Dk} | CPS4 ⊆ Dk
Set of all Designs {Dk}
{Dk} | CPS3⊆ Dk {Dk} | CPS2⊆ Dk
{D k} | CPS1 ⊆ Dk {PS0}
{PS1}
{Dk} | CPS1⊄ Dk Participant – Transaction – Transaction Transaction CPS1={PS0} Line Item CPS2=CPS1 ∪{PS1}
{PS2}
{PS3}
{PS4}
{Dk} | CPS4⊄ Dk
{Dk} | CPS5⊄ Dk
{Dk} | CPS3⊄ Dk
{Dk} | CPS2⊄ D k
Transaction – Actor -Subsequent Participant Transaction, CPS4=CPS3 Transaction – ∪{PS3} Specific Item CPS3=CPS2 ∪ {PS2}
Place – Transaction CPS5=CPS4 ∪{PS4}
Fig. 9. Application of the Algorithm for Identifying Common Pattern Sets
Step 2: Discriminate CPS’s using ontological classification of instantiations. The CPS’s provide an approximation of the shared design elements – analysis patterns – in the different designs. However, the same pattern may be instantiated in very different ways in different designs. For example, Participant – Transaction,
214
Tae-Dong Han et al.
instantiated as ‘cashier’ for Participant and Instantiation of subset ‘session’ for of {P S} ∈ Dk CPSz 1 (same ontological Transaction in one classifications) design (Dk) and as CPSz ‘supplier’ for {PS} {Dk} Participant and CPSz 2 Instantiation of subset ‘shipment’ for of {P S} ∈ Dk’ Transaction in another (different ontological design (Dl). In Dk, the classifications) instantiation ‘cashier’ Repeat …… suggests the ontological Fig. 10. Refining Common Pattern Sets classification, person, whereas the instantiation ‘supplier’ suggests the ontological classification, tradable social structure. Although similar patterns are used in the two designs, their specific instantiations are different. By comparing the ontological classification of the instantiations, we can further refine the designs within each CPS group. Consider a common pattern set, CPSk, in which three designs (say D1, D2, and D3) are classified and part of CPS is Participant – Transaction pattern. It is instantiated as ‘employee’ (ontological classification person) – ‘contract’ (ontological classification event) in two designs (say D1 and D2); that is, it has the same corresponding ontological classification. The other design, D3, has an instantiation ‘employee’ (ontological classification person) – ‘session’ (ontological classification activity) that is a different ontological classification. As a result, CPSk is further refined into CPSk1 (with D1 and D2) and CPSk2 (with D3). The algorithm for this refinement is shown below and an application shown in Figure 10. Ontological Refinement Algorithm 1. In all designs that share the next lowest CPS, determine the ontological classification of instantiations of all patterns in that CPS (say CPSz). Let n = 0 to track refined CPS’s. 2. Determine the next maximal subset of patterns from the current CPS such that the ontological classifications of their instantiations are equivalent. 3. Create a new CPSz(n+1) consisting of that maximal subset of patterns. 4. Redefine the current CPS as CPSz1 – CPSz(n+1). If this is not NULL, return to step 2. Increase n by 1. 5. Repeat by returning to step 1.
5. Assessment of Results The reusable artifacts, design fragments, and the methodology developed provide the set of clustered design fragments shown in Table 2 and Figure 8. The process described above, thus, parses the available designs in a domain to create design fragments that may be (a) accessed, (b) suggested, (c) extended, and (d) refined. Keyword-based clustering provides (a) an access mechanism to design fragments. Common-pattern-set-based clustering (b) provides suggestions for design fragments
A Methodology for Building a Repository of Object-Oriented Design Fragments
215
and (d) extends suggestions to alternative designs. Finally, the ontologicalclassification-based clusters can be used to (c) refine the chosen design fragments. We illustrate the usefulness of our approach using three cases from the same domain. Table 5 shows the results. Table 5. Suggested Starting Points for Three New Design Situations
Case
Keywords
‘Concepts’ identified
Designs correspondi ng to this ‘concept’ (Table 1)
Possible starting points for the designer Design Designs in fragments that cluster (CPS’s) where (Table 2) these designs are clustered Case 1: “To keep track of personal information about employee, to efficiently manage the payroll, benefits and other information related to employee, and to help decision-making step on employee’s incentive plan and hiring process.” 1 Employee, C6 D7 CPS5 D1 step D7 D8 D10 CPS2 D5 D10 Case 2: “To track personal information about company employee, to identify current employee job category, classification, and salary, to track employee evaluations and salary changes, and to track the benefits being used by each employee.” 2 Company, C2 D2 CPS4 D2 employee D3 D9 D4 CPS1 D4 C5 D1 CPS5 D1 D7 D8 D2 CPS4 D2 D3 D9 Case 3: “To improve recruitment contract process, to increase knowledge base about company, and to ensure fast tracking of employee agreement.” 3 Company, D1 CPS5 D1 C5 employee, D7 agreement D8 D2 CPS4 D2 D3 D9 For case 1, the designer enters the problem description and, ‘employee’ (k2) and ‘step’ (k7) are identified as important keywords (Refer to Step 1 in section 4.1). From
216
Tae-Dong Han et al.
these, the common keyword concept (C6) is identified and designs retrieved that are clustered in the concept. Two designs (D7 and D10) are retrieved from that cluster. However, they have quite different shapes, that is, they are classified into two different design groups by CPS criterion. Since D7 belongs to CPS5 cluster (see Table 2), and D10 to CPS2 cluster, CPS2 and CPS5 are suggested to the designer as core design fragments. If the designer selects design fragment CPS2, the designer may select D5 or D10 as the starting point. If the designer selects design fragment CPS5, the designer may select D1, D2, or D7 as the starting point. The designer may extend his/her own designs from the instantiated design fragments suggested (CPS2 or CPS5), or from the complete designs suggested (D5, D10, D1, D2, or D7). Thus, for each case, several designs are immediately derived from a textual description. For case 2, two concepts (C2 and C5) are identified from two keywords ‘company’ and ‘employee,’ and three designs (D1, D2, and D4) are retrieved. For case 3, three important keywords exactly match C5. For case 2, three design fragments (CPS1, CPS4, and CPS5) are suggested and D3 and D9 (from CPS4), or D7 and D8 (from CPS5) are suggested as alternatives designs. Since CPS1 has only one design (D4 itself), no further alternative design is suggested. For case 3, two core design fragments, CPS4 and CPS5, are suggested and also D3 and D9 (from CPS4), or D7 and D8 (from CPS5) are suggested as alternatives designs. More core design fragments and alternative designs may be suggested as the size of repository grows. The results show that a repository of design fragments can be used to make immediate, reliable suggestions of relevant core design fragments and alternative designs for the development of a new design from a plain text problem description. Prior designs can be reused without much modification. Since variants of designs are stored in the repository, the designer can navigate the repository to find a design to fit to the new situation. When the size of repository is practically large enough, the number of designs in a specific domain can be expected to contain sufficient design variants so that the designer can find a suitable one.
6. Conclusion We have proposed a new kind of reusable artifact, design fragments, and provided a methodology for building a repository of them for efficient reuse. The methodology focuses on retaining all prior designs, creating design fragments through clustering and storing them in an organized manner. The approach provides several advantages. First, no additional effort is required to build the reusable designs. When a designer generates a design, he/she does not have to be alerted for the future reuse of the design; for example, trying to find common rules to be reused for many slightly different situations. Second, since design fragments are indexed based on similarity measures, appropriate design(s) can be easily searched. Third, a design fragment is a whole design that is developed and indexed based on common patterns. It provides an immediate solution without much modification once an appropriate design is found that is appropriate for the new system. Fourth, a design fragment is a higher granularity than analysis patterns and has the same specificity as a domain model, which makes it easier for reuse. We are developing a prototype to implement the methodology, and plan to populate it with designs obtained from design sessions
A Methodology for Building a Repository of Object-Oriented Design Fragments
217
using a web-based interface, which will be developed using Java™. Clustering algorithms are being developed using C++. The ontological classifications and instantiations corresponding to them are stored together with analysis patterns in the Pattern Library, which is implemented in a relational database. The prototype system incorporates an existing system [11] to implement the ontology for the new instantiations, which are classified into a proper ontology and, in turn, stored in the Pattern Library. Use of the prototype should lead to a repository that includes multiple indexed designs for many different application domains, and facilitate further testing.
References 1.
Alexander, C., S. Ishikawa, M. Silverstein, M. Jacobson, I. Fiksdahl-King, and S. Angel, A Pattern Language, Oxford University Press, New York, 1977. 2. Coad, P., D. North, and M. Mayfield, Object Models: Strategies, Patterns, and Applications, Prentice Hall, 1995. 3. Fowler, M., Analysis Patterns: Reusable Object Models, Addison-Wesley, 1997. 4. Gamma, E., R. Helm, R. Johnson, and J. Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software, Addison-Wesley, 1995. 5. Michalski, R. S., “Knowledge Acquisition Through Conceptual Clustering: A Theoretical Framework and Algorithm for Partitioning Data Into Conjunctive Concepts,” International Journal of Policy Analysis and Information Systems, Vol. 4, 1980, pp. 219-243. 6. Michalski, R. S. and R. E. Stepp, “Learning from Observation: Conceptual Clustering,” In Machine Learning: An Artificial Intelligence Approach by Michalski, R. S., J. G. Carbonell, and T. M. Mitchell (Eds.), Vol. 1, Morgan Kaufmann, Los Altos, CA, 1983, pp.331-363. 7. Mili, H. et al., “Reusing Software: Issues and Research Directions,” IEEE Transactions on Software Engineering, June 1995, pp. 528-562. 8. Purao, S. and V. Storey, “Intelligent Support for Selection and Retrieval of Patterns for Object-Oriented Design,” In Proceedings of the 16th International Conference on Conceptual Modeling (ER'97), Los Angeles, 3-6 November, 1997a. 9. Purao, S. and V. Storey, “APSARA: A Web-based Tool to Automate System Design via Intelligent Pattern Retrieval and Synthesis,” In Proceedings of the 7th Workshop on Information Technologies & Systems, Atlanta, GA., Dec. 1997b, pp. 180-189. 10. Purao, S., V. Storey, and T. Han, “Improving Reuse-based System Design with Learning,” Working Paper, 1998. 11. Storey, V., Dey, D., Ullrich, H., and Sundaresan, S., “An Ontology-Based Expert System for Database Design,” Data and Knowledge Engineering, 1998. 12. Storey, V., H. Ullrich, and S. Sundaresan, “An Ontology to Support Automated Database Design,” Proceedings of the 16th International Conference on Conceptual Modeling (ER'97), Los Angeles, 3-6, November, 1997, pp.2-16.