Engineering Optimization

2 downloads 5471 Views 368KB Size Report
domain, including applications in conceptual engineering design. For example, Parmee ..... The first case study is a generalized cinema booking system, derived.
This article was downloaded by:[University of the West of England] [University of the West of England] On: 19 June 2007 Access Details: [subscription number 773569630] Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Engineering Optimization Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t713641621

A cross-disciplinary technology transfer for search-based evolutionary computing: from engineering design to software engineering design To cite this Article: Simons, C. L. and Parmee, I. C. , 'A cross-disciplinary technology transfer for search-based evolutionary computing: from engineering design to software engineering design', Engineering Optimization, 39:5, 631 - 648 To link to this article: DOI: 10.1080/03052150701382974 URL: http://dx.doi.org/10.1080/03052150701382974

PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article maybe used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. © Taylor and Francis 2007

Downloaded By: [University of the West of England] At: 14:57 19 June 2007

Engineering Optimization Vol. 39, No. 5, July 2007, 631–648

A cross-disciplinary technology transfer for search-based evolutionary computing: from engineering design to software engineering design C. L. SIMONS* and I. C. PARMEE Advanced Computation in Design and Decision Making Laboratory, Faculty of Computing, Engineering and Mathematics, University of the West of England, Bristol, BS16 1QY, UK Although object-oriented conceptual software design is difficult to learn and perform, computational tool support for the conceptual software designer is limited. In conceptual engineering design, however, computational tools exploiting interactive evolutionary computation (EC) have shown significant utility. This article investigates the cross-disciplinary technology transfer of search-based EC from engineering design to software engineering design in an attempt to provide support for the conceptual software designer. Firstly, genetic operators inspired by genetic algorithms (GAs) and evolutionary programming are evaluated for their effectiveness against a conceptual software design representation using structural cohesion as an objective fitness function. Building on this evaluation, a multi-objective GA inspired by a non-dominated Pareto sorting approach is investigated for an industrial-scale conceptual design problem. Results obtained reveal a mass of interesting and useful conceptual software design solution variants of equivalent optimality – a typical characteristic of successful multi-objective evolutionary search techniques employed in conceptual engineering design. The mass of software design solution variants produced suggests that transferring search-based technology across disciplines has significant potential to provide computationally intelligent tool support for the conceptual software designer. Keywords: Evolutionary computation; Search; Engineering design; Software engineering

1.

Introduction

Empirical evidence suggests that the process of identifying candidate classes in object-oriented conceptual software design is difficult for human designers to learn and perform (Guindon 1990, Glass 2003, Svetinovic et al. 2005). Comprehension of conceptual software designs is subjective, and designer performance may vary more than 5:1 from the best designers to the worst (Brooks 1995). The negative impact and cost of these difficulties is difficult to judge, but poor conceptual software designs have significant deleterious downstream consequences for software development. Furthermore, computational tool support for conceptual software design is limited. While some attempts have been made to harness Natural Language Processing (Mich and Garigliano 2002, Liu et al. 2004), computational tools that provide interactive design support for the software designer are hitherto not evident in the research literature. *Corresponding author. Email: [email protected]

Engineering Optimization ISSN 0305-215X print/ISSN 1029-0273 online © 2007 Taylor & Francis http://www.tandf.co.uk/journals DOI: 10.1080/03052150701382974

Downloaded By: [University of the West of England] At: 14:57 19 June 2007

632

C. L. Simons and I. C. Parmee

A previous survey by Simons et al. (2003) compares the design lifecycles of typical methodologies found in engineering design with those found in software engineering design. While differences are evident due to the different nature of the products being designed, correspondence between design phases exists nevertheless. For example, both exhibit a product planning/requirements phase, followed by a conceptual design/analysis phase. This is typically followed by an embodiment/physical design phase, then a detailed design/implementation and test phase. Moreover, computational support for engineering conceptual design has received significant attention within the evolutionary computation (EC) domain, including applications in conceptual engineering design. For example, Parmee (2001) reports the successful application of EC over a range of conceptual engineering design projects. Parmee (2002) also reports on improving conceptual design representation via an iterative EC/designer approach and further proposes human-centric intelligent systems for design exploration and knowledge discovery (Parmee 2005). In a cross-disciplinary transfer of technology, we propose that the benefits of an engineering design EC search-based approach may also be beneficial to software engineering conceptual design. Applying search space exploration and optimization to software engineering was proposed by Harman and Jones in 2001, and since then search-based approaches to software engineering are increasingly being researched in all phases of the software development lifecycle. An example of software quality enhancement by multi-objective search includes the predicted ranking of software modules according to the probability that they contain defects, in order to assign scarce resources to address the software modules with the highest likelihood of containing defects (Khoshgoftaar et al. 2004). Software clustering and subsystem decomposition have also been widely investigated (e.g. Seng et al. 2005). However, such quality enhancement and clustering algorithms are typically applied downstream after the human designer has designed and built the software. This differs significantly from the approach advocated in this article where computational search and exploration supports the designer during the process of conceptual software design. This article’s approach is consistent with other upstream approaches of search-based evolutionary algorithms within software engineering, such as the application of clustering techniques to software component architecture design (Lo and Chang 2004). It is important to state that the approach taken in this article seeks not to replace the human designer but to support him/her by enabling the interactive exploration and exploitation of conceptual design search space. Such an interactive system could support both experts in novel and uncertain problem domains and novice designers who might promptly see the consequences of interacting with the tool to direct the search along various avenues. To support this, this article reports an evaluation of genetic operators as the basis of interactive tool support for the software designer. Firstly, the representation of the software design problem and solution space is described. Next, objective fitness functions are presented. Investigations into variety-promoting operators for single-objective evolutionary algorithms are then presented for a case study. This is followed by the results of a multi-objective genetic algorithm (GA) for a further industrial-scale case study. Based on the results, the article concludes with an evaluation of the potential benefits of a cross-disciplinary technology transfer of interactive EC search from conceptual engineering design to conceptual software engineering design.

2.

Representation

For effective search and exploration, it is necessary to represent both the design problem and the design solution search space. Both have been described previously by Simons and Parmee (2005) and are summarized as follows. The design problem is represented via use cases, while

Downloaded By: [University of the West of England] At: 14:57 19 June 2007

Cross-disciplinary technology transfer for search-based evolutionary computing

633

the design solution is represented as classes. Representing the design problem as use cases (Jacobson 1992) has been widely applied in software engineering. A use case describes a set of scenarios of interaction by means of chronological sequences of steps. Such steps may be recorded as text in a number of possible formats, reflecting the formality and granularity of system behavior being elicited. Cockburn (2001) refers to a step as an ‘action step’ and suggests that the textual grammar of an action step should be Subject…verb…direct object…prepositional phrase. For example: The system…deducts…the amount…from the account balance.

In any scenario, the subject of an action step is either the actor or the system, depending on where the control lies at that point in the sequence. From the textual narrative of the use case, it is possible to identify actions (and their parameters) that the software is to perform, together with individual items of data that the software will manipulate. If an action and a datum are co-located in a step of the narrative, the action is said to ‘use’ the datum. Thus the problem domain can be specified by three finite sets extracted from the use cases of the problem: (i) a set of unique atomic data (ii) a set of unique actions and (iii) a set of uses. The structure of object-oriented software designs is expressed by classes, which act as placeholders for attributes (or data items) and methods (or functions). The representation of object-oriented designs used in this article comprises classes, attributes and methods. Each attribute in the design solution is derived directly from each member of the set of data specified in the problem domain, while each method is derived directly from each member of the set of actions specified in the problem domain. Thus the design solution search space comprises: (i) a set of methods (and their parameters) directly derived from the set of actions in the problem domain and (ii) a set of attributes, directly derived from the set of data of the design problem. At this point, the notion of class is introduced to the design solution search space as a grouping for methods and attributes. There are many possible ways in which attributes and methods may be allocated and grouped into classes; however one constraint is enforced: each class must contain at least one attribute and at least one method. The resulting representation of the search space is thus a scattered landscape of highly discrete design solutions. A simple example of the representation is shown in figure 1. For the purposes of evolutionary search and exploration, design solutions are encoded directly into an object-oriented programming language (e.g. Java). Thus a single design solution could be considered in EC terms to be the equivalent of a chromosome. However, fitness values are computed directly against the design solution without the need to decode the individual. To fully evaluate the characteristics of genetic operators against the representation, the investigation was divided into two stages. Firstly, a single-objective GA was applied to evaluate both the variety promoting operators and a cohesion fitness function. Secondly, building upon the initial findings obtained, a multi-objective GA was developed. Section 3 of this article reports the results of the single-objective GA against an initial case study. Section 4 then reports the results of the multi-objective GA against a further industrial-scale case study. Limitations are discussed in section 5 and conclusions are presented in section 6.

Downloaded By: [University of the West of England] At: 14:57 19 June 2007

634

C. L. Simons and I. C. Parmee

Figure 1.

3.

Example of the representation of a design problem and a corresponding possible design solution.

Single-objective genetic algorithms

3.1 Fitness The software engineering community has widely applied metrics to both software designs and source codes in an attempt to quantify various properties of software, including structural integrity. A number of structural design properties of software have been widely investigated and it is generally accepted that good indicators of software design structural integrity are high cohesion within classes and low coupling among classes. A number of surveys of the use of cohesion and coupling metrics have been conducted and frameworks have been proposed (Chidamber and Kemerer 1994, Briand et al. 1999). In the process of conceptual software design, a number of human designer assessments contribute to the overall structural integrity of the software design, i.e. software design is inherently multi-objective. For the purposes of this article, however, it is necessary to select a reduced number of conflicting metrics to illustrate the likely characteristics of the search. In selecting metrics, it is important that a metric (i) may be applied to upstream conceptual models rather than downstream programming language source code and (ii) is efficient to compute. Based on this, the Cohesiveness of Methods (COM) metric (Harrison et al. 1998) was selected as the basis of measuring the cohesion of individual conceptual designs. According to its proponents, COM is ‘for each attribute, the sum of all the methods using an attribute divided by the total number of methods, all divided by the number of attributes in the class’. Values of COM lie in the range 0 to 1. Expressed another way, consider a resulting class C and define AC , MC as the set of attributes and methods, respectively, that are contained in class C. Then the COM fitness for the class C, denoted by f (C), is given by:

f (C) =

 1 δij |AC | |MC | i∈A ,j ∈M C

C

where  δij =

1 0

if method j uses attribute i . otherwise

Downloaded By: [University of the West of England] At: 14:57 19 June 2007

Cross-disciplinary technology transfer for search-based evolutionary computing

3.2

635

Single-objective genetic operators

For the conceptual software design representation used in this article, the genetic operators investigated were inspired by approaches that might be broadly categorized as GAs and evolutionary programming (EP). Operators such as those used in evolutionary strategies and genetic programming were also considered but rejected as they appeared to be poorly suited to the object-based representation of conceptual software design search space. (Evolutionary strategies have been typically applied to real parameter value representations, while genetic programming representations are tree-like structures of terminals and functions.) 3.2.1 GA-inspired operators. The selection, crossover and mutation operators used in this investigation are inspired by the simple binary string GA example presented by Goldberg (1989) and reiterated by Deb (2001). Selection is performed by two techniques, namely tournament selection and roulette-wheel selection. Recombination is achieved by means of a trans-positional crossover (TPX) in which two individuals are chosen at random from the population, and their attributes and methods swapped between the two based on their class position within the individuals. For example, if an attribute was found to be in the first class of the first individual and the last class of the second, the attribute was relocated to the last class for the first individual, and the first class for the second. An example of TPX is shown in figure 2. However, a constraint of the search space is that each class must contain at least one attribute and method. Thus positional swapping can only occur where swapping an attribute or method to another class would not leave the class lacking attributes or methods. The approach taken to mutation is inspired by the simple GA (Goldberg 1989). Thus, a single individual is mutated by relocating an attribute and a method from one class to another. However, this GA-like mutation also complies with the ‘at least one attribute and one method’ constraint in that attributes/methods are not taken from classes with only one attribute/method. 3.2.2 EP-inspired operators. The offspring creation and mutation operators used in this investigation are inspired by Fogel et al. (1966). Potential offspring are created by mutating individuals from within the current population. To determine which individuals survive into the next generation, each individual takes part in a COM fitness tournament against 20 individuals selected at random from the combined populations of current and mutated individuals. A score

Figure 2.

Example of GA-inspired trans-positional crossover.

Downloaded By: [University of the West of England] At: 14:57 19 June 2007

636

C. L. Simons and I. C. Parmee

Figure 3.

Example of EP-inspired mutation at both class and element level.

is awarded to each individual based on the number of tournaments won, and each individual is ranked in the population according to their score. Highest ranking individuals progress to the next generation. Mutation is performed by two mechanisms: class-level mutation and element-level mutation. At class level, all attributes and methods of a class in an individual are swapped as a group with another class selected at random. At element level, elements (attributes and methods) in an individual are swapped at random from one class to another, with the following exception. Element-level mutation, like GA-inspired mutation, is limited by the ‘at least one attribute and one method per class’ constraint. Neither attributes nor methods are swapped where removing an attribute or method from the class would leave the class without an attribute or method (figure 3). 3.2.3 Process of single-objective evolutionary algorithms. Initialization of the population is achieved by allocating a number of classes to each individual design at random, within a range derived from the number of attributes and methods. Allocating the number of classes at random is necessary (and desirable from a search point of view) as the ‘correct’ number of classes cannot be known a priori. All attributes and methods from the sets of attributes and methods are then allocated to classes within individuals at random. The processes of the single-objective evolutionary algorithms used in this article are illustrated by flowcharts in figure 4. 3.3 Case study – cinema booking system 3.3.1 Background. The first case study is a generalized cinema booking system, derived from a number of established internet-based cinema booking systems existing in the UK. This case study has been used by the first author as a software design problem for second-year undergraduate software engineering students on many occasions. The design problem of the cinema booking system addresses, for example, making an advance booking for a showing of a film and the collection of tickets on attending the cinema auditorium. A specification of the use cases of the cinema booking system design problem is available (Simons 2005a). The cinema booking system design problem comprises 15 actions, 16 data and 39 uses.

Downloaded By: [University of the West of England] At: 14:57 19 June 2007

Cross-disciplinary technology transfer for search-based evolutionary computing

Figure 4.

Flowchart of GA-inspired and EP-inspired evolutionary processes.

Figure 5.

Cinema booking system manual conceptual design.

637

Downloaded By: [University of the West of England] At: 14:57 19 June 2007

638

C. L. Simons and I. C. Parmee Table 1.

Cohesion values of cinema booking system manual design.

Design concept (class)

COM cohesion value

Film Showing Screen Booking Payment Average

0.75 0.2185 1.0 0.555 0.625 0.62975

During the teaching of this design problem to students over a number of years, it is noticeable that a number of classes are repeatedly identified. Taking the actions and data specified in the design problem (Simons 2005a), an amalgam of these recurring classes is shown in figure 5. Various cohesion values of the manual design classes for the cinema booking system are shown in table 1. 3.3.2 Evolutionary search results. Optimum settings for parameters were chosen empirically by trial and error when used with the software design search space representation and are as follows: (i) selection – either tournament or roulette-wheel for GA-inspired operators, or sequential (i.e. no selection) for the EP-inspired variant; (ii) crossover and mutation probabilities – (0.7,0.03) for GA-inspired operators and (0.0,1.0) for the EP-inspired variant; (iii) offspring creation and replacement strategy – (100,100) for GA-inspired operators (i.e. 100 parents generate 100 offspring and only those offspring become parents of the next generation) and (100,100) replacement for the EP-inspired variant. Both the average COM fitness of every generation and the maximum COM fitness of any individual within the population of a generation were recorded as the evolutionary search for the cinema booking system case study evolved. For each algorithm, ten evolutionary searches were conducted, the population being initialized at random prior to each evolutionary search. At the end of the evolutionary search, ‘convergence’ occurred if one or more individuals of superior fitness took over the entire population, i.e. despite the variation pressure, fitter individuals no longer emerged during the remainder of the evolutionary search. The mean average COM fitness results of the ten searches are shown in table 2, together with the average number of classes found in the final generation of the search, and an indication of the generation Table 2.

Population average search results.

GA (tournament) variant

Average population fitness for final generation Average number of classes in final generation Generation at which convergence occurs

GA (roulette-wheel) variant

EP variant

COM fitness

Std Dev

COM fitness

Std Dev

0.6436

0.047

0.6121

0.0647

0.6137

0.0723

7.2

0.9375

7.4

0.8447

4.1

0.7496

31.9

3.9835

49.4

8.46

COM fitness

974.46

Std Dev

26.48

Downloaded By: [University of the West of England] At: 14:57 19 June 2007

Cross-disciplinary technology transfer for search-based evolutionary computing

Figure 6.

639

COM fitness for GA variant, single run.

number at which convergence occurred. All COM fitness values are supplied with values of the standard deviation from the mean. For the GA variant operators, the average COM fitness for the final generation for both tournament and roulette-wheel selection is similar, as is the average number of classes in the final generation. However, convergence to a local optimum is quicker with tournament selection (generation 32) than roulette-wheel (generation 49). Figure 6 shows the average and maximum fitness achieved as the population evolved for GA search. Both selection operators are shown for comparison. Figure 6 also reveals that the average and maximum COM fitness of the GA population with roulette-wheel selection lagged behind tournament in terms of generation number. This finding is consistent with authors such as Deb (2001) who report that tournament selection has better or equivalent convergence properties when compared with other reproduction operators in the literature. It seems likely that a contributory factor to the rapid convergence of GA is the limitations of the TPX operator. There are two circumstances where TPX fails to exchange information between two individuals: (i) where the position of a method or attributes is the same in both individuals; (ii) where the method or attribute is the only method or attribute in an individual. Such constraints are necessary to ensure that the consistency of the sets of attributes and methods in an individual is not violated. For the EP variant operators, the average population COM fitness in the final generation is similar to that achieved by the GA variant operators. It is notable how similar the final average COM fitness values of the three EAs are, although the variance about the values increased from GA tournament, to GA roulette-wheel, to EP. However, EP differed significantly from GA in terms of the average number of classes per individual in the final generation (four for EP as opposed to seven for GA), and the generation at which convergence occurred (974 for EP, 49 for GA roulette-wheel, 32 for GA tournament). The average and maximum fitness achieved for EP variant operators is shown in figure 7. The higher number of generations required to achieve convergence and the finding that convergence involved average COM fitness values repeating over a five to ten generation cycle with no single individual taking over the population suggest that the balance between exploitation and exploration pressures were different for the EP variant. Additionally, reading the standard deviation values of average COM fitness in the final generation from table 2, a greater variety among individuals is evident with the EP variant operators. This may be explained by the greater exploration of the search space achieved by EP mutation. Unlike TPX, there are no circumstances in which EP class level mutation fails

Downloaded By: [University of the West of England] At: 14:57 19 June 2007

640

C. L. Simons and I. C. Parmee

Figure 7.

COM fitness for EP variant, single run.

Figure 8. Average COM fitness for three EAs, single run.

to perturb an individual’s attributes and methods, resulting in a more effective exploration of the search space. The greater exploration of the search space achieved by EP explains why the number of generations required to achieve convergence is greater in EP than in GA. It is evident that mutation at both class and element levels is better suited to the search space representation insofar as mutation inherently respects the invariance of the sets of attributes and methods encapsulated in their classes. This contrasts with TPX, which violates class encapsulation. Thus the EP-inspired mutation operator proves to be algorithmically simpler and more efficient at promoting variety than TPX. A comparison of the average generation COM fitness achieved by the evolutionary searches of the three evolutionary algorithms is shown in figure 8, while figure 9 shows the COM cohesion value for the human-performed design with the average COM cohesion values given previously in table 2. In terms of COM cohesion values, the genetic operators produced conceptual software designs of similar cohesion to human performance. Comparing the classes produced by the evolutionary algorithms with the human performed design reveals that classes with high COM cohesion values, i.e. Screen, Film, are typically apparent in the design variants produced by the evolutionary algorithms, with COM cohesion values of 1.00 and 0.75 respectively. However, human-identified classes of lower COM cohesion values (i.e. Showing and Booking) are less frequently evident. It seems logical to

Downloaded By: [University of the West of England] At: 14:57 19 June 2007

Cross-disciplinary technology transfer for search-based evolutionary computing

Figure 9.

641

Fitness of human design, EA search.

suggest that human-performed designs are reached by the human designer taking account of multiple fitness functions during manual design variant evaluation. We speculate that such fitness functions may include cohesion of individual classes, coupling between classes in the design, and the ‘granularity’ of the classes present, reflected by their size (in terms of the number of attributes and methods they contain). We also note that there exists an inherent conflict between class cohesion and the size of a class, i.e. larger classes tend to have higher cohesion values while smaller classes tend to have smaller cohesion values. In light of this, we conjecture that a multi-objective search may be better suited for support of the design processes of the human designer.

4.

Multi-objective genetic algorithm

4.1 Fitness From the results obtained from the single-objective GA, it is evident that there is an issue with the COM metric, i.e. there is no consideration of class size. For example, consider a class with one attribute and one method. If the method uses the attribute, the value of COM is 1. However, consider a class with two attributes and two methods, where both methods use both attributes. Again, the value of COM is 1, thus the size of the class has no impact on the value achieved. A number of attempts were therefore tested to better capture class cohesion by reflecting class size. These attempts include multiplying the COM value by: (i) (ii) (iii) (iv)

the number of attributes and methods in the class; the square root of the number of attributes and methods in the class; the number of uses in the class; and the square root of the number of attributes and methods in a class.

To provide an indication of class size, the second fitness function selected is the number of classes in the design solution. The premise here is that the fewer the number of classes in the design, the greater the number of attributes and methods the classes must contain and vice versa. Class size conflicts with cohesion in the following manner. Logically speaking, the tendency of the COM metric is such that if methods and attributes are grouped in a small number of classes, the COM value will be high. Indeed, if all methods and attributes are grouped in one class, the COM value will be maximum, i.e. 1. In reality, however, no class designs of significant scale are composed of one class. Thus, to counter this, the number of

Downloaded By: [University of the West of England] At: 14:57 19 June 2007

642

C. L. Simons and I. C. Parmee

classes in a design solution is measured and a design solution with higher number of classes is preferred to a design solution with fewer classes. 4.2 Multi-objective genetic operators Within the framework of EC, it is important to recognize the characteristics of a search space representation and match the genetic operators of the evolutionary algorithm that can be best used with it. Based on the results obtained from the single-objective GA investigations, the diversity-promoting operator chosen is mutation as this operator appears to promote greater solution variety and is computationally more efficient. The optimization and diversity preservation operators used to investigate a multi-objective search approach are inspired by the elitist Non-Dominated Sorting Genetic Algorithm (NSGAII) proposed by Deb (2001). In NSGA-II, an offspring population of the same size is created from the parent population. The combined population is non-domination sorted into ordered ‘fronts’ of equivalent optimality. The new population is filled by solutions of different nondominated fronts, one at a time: the filling starts with the best front, and continues with the next best and so on until the population size is met or exceeded. Solutions that do not make the new population are discarded. Typically, the last front contains more solutions than there is space available. In selecting which solutions go forward, a ‘crowding distance sorting’ operator is employed to ensure that the most diverse range of solutions is preserved. In the early stages of evolution, many fronts are evident in the population. However, as evolution proceeds, the number of fronts decreases until eventually a small number of fronts, including the Pareto-optimal front, remains. At this point, the crowding distance sorting operator is crucial in preserving diversity of solutions. 4.3 Parameter settings Optimum settings for multi-objective parameters were chosen empirically for NSGA-IIinspired genetic operators when used with the software design search space representation as follows: (i) selection – no selection performed; (ii) crossover and mutation probabilities – (0.0,1.0) (i.e. no crossover performed); (iii) offspring creation and replacement strategy–(200,200) (i.e. 200 parents generate 200 offspring by mutation), and (200,200) replacement (i.e. the least dominated 200 individuals out of the combined population of 400 go forward to the next generation). 4.4

Case study – select cruises

4.4.1 Background. The second case study, derived from Apperley et al. (2003), is based on an industrial case study relating to a cruise company selling adventure holidays on tall ships in the Pacific Ocean where passengers are members of the crew. Cruises comprise ‘legs’ or ‘passages’ from island to island, e.g. Auckland to Tonga, Tonga to Vava’u, etc. Computer support is required for processing sales enquiries and berth bookings. A specification of the use cases derived from the Select Cruises design problem is available from Simons (2005b). The Select Cruises design problem comprises 29 actions, 62 data and 144 uses. 4.4.2 Manual conceptual design. Applying the representation described previously in section 2 of this article manually to the Select Cruises design problem yields the classes

Downloaded By: [University of the West of England] At: 14:57 19 June 2007

Cross-disciplinary technology transfer for search-based evolutionary computing

Figure 10.

643

Manual conceptual design of Select Cruises classes.

shown in figure 10. The interested reader is referred to Apperley et al. (2003, p. 186) for the class model provided in the case study, to which all the classes in figure 10 can be traced. 4.4.3 Cohesion of manual design. Various cohesion values of the classes of the manual design for Select Cruises are shown in table 3. 4.4.4 Evolutionary search results. Table 4 shows the various cohesion values achieved for the evolutionary simulations for the Select Cruises case study, together with the convergence generation and the time taken to achieve convergence in seconds. Figures presented are the averages of ten evolutionary runs, together with the standard deviation value. The column headings are:

Downloaded By: [University of the West of England] At: 14:57 19 June 2007

644

C. L. Simons and I. C. Parmee Table 3.

Cohesion values of the Select Cruises manual design.

CLASS

COM

M+A

COM. M+A

√COM. M+A

uses

COM. uses

COM. √ uses

Ship Berth Cruise Passage BerthAll Booking Customer Credit Card Preferences MailRequest Mailing Transaction Inquiry Quote Sale L. Template

1 0.83 1 1 0 0.875 1 1 1 1 1 0 0 0.8888 1 0

6 7 9 11 2 6 14 7 6 7 5 1 0 7 5 0

6 5.81 9 11 0 5.25 14 7 6 7 5 0 0 6.2216 5 0

2.4494 2.1959 3 3.3166 0 2.1433 3.7416 2.6457 2.4494 2.6457 2.2360 0 0 2.3515 2.2360 0

8 10 18 24 0 7 24 6 5 8 4 0 0 8 6 0

8 8.3 18 24 0 6.125 24 6 5 8 4 0 0 7.1104 6 0

2.8284 2.3912 4.2426 4.8989 0 2.1655 4.8989 2.4494 2.2360 2.8284 2 0 0 2.3700 2.4494 0

Average

0.7246

5.8125

5.4551

1.9632

8

7.7834

2.2349

Table 4.

COM COM. M+A COM. √ M+A COM. Uses COM. √ uses

Cohesion values achieved for the Select Cruises case study.

Average

Std Dev

Best

Std Dev

Manual

Conv

Time

0.2629 6.3030

0.0068 3.2578

0.3305 8.6797

0.0099 0.9875

0.7246 5.4551

5000 2000

1104.53 261.40

1.5684

0.7819

2.0030

0.3552

1.9632

3000

682.00

8.4145

2.6531

11.0240

2.2964

7.7834

2000

255.26

1.6882

1.3259

2.3225

0.8247

2.2349

3000

694.28

(i) (ii) (iii) (iv) (v)

Average – the average population fitness value Std Dev – standard deviation Best – the fitness value of the best design solution in the population Manual – the cohesion value of the manual design shown for comparison Conv – the greatest generation number at which convergence occurs, i.e. convergence may occur prior to this generation number, but never after (vi) Time – the average time taken to arrive at convergence in seconds.

Comparing the average population cohesion fitness with the manual designs reveals COM metric values that are lower for the evolutionary simulations than the manual designs. However, for cohesion metrics that take account of class size, a broad similarity between the average population cohesion fitness and the manual designs is evident. Values achieved by the COM.M+A and COM.uses and cohesion √ √ metrics are higher than the manual design cohesion values, while COM. M+A and COM. uses values are lower. Comparing the cohesion values of the best design solution in the population with the manual designs reveals that COM values are lower for the evolutionary simulations. √ However, for cohesion fitness metrics that take account of class size, COM.M+A, COM. M+A and √ COM.uses metrics produce higher values, while COM. uses produce lower values. Taken

Downloaded By: [University of the West of England] At: 14:57 19 June 2007

Cross-disciplinary technology transfer for search-based evolutionary computing

645

together, the results of the average population fitness and best design solution fitness suggest that taking account of class size when calculating cohesion fitness produces values more comparable with manual designs than not taking account of class size. The results for the number of generations required to achieve convergence in a simulation run reveal the effect of case study scale and cohesion metric. Simulation runs with the COM metric require more generations to reach convergence than the simulation runs involving cohesion metrics that take account of size. This is explained by the observation that the range in cohesion values for metrics that take account of size is much greater than COM (see table 4). For the metrics that take account of class size, design solutions of high cohesion produce proportionately high cohesion values and hence the design solution is more likely to be non-dominated and thus survive to the next generation, increasing population fitness more quickly as the generations evolve. In addition, for simulation runs involving cohesion metrics that take account of size, the effect of case study scale is evident in the higher number of generations required for convergence in the Select Cruises case study. However, for metrics that take account of class size, the time to achieve convergence is no more than 12 minutes (and often less), which for a non-trivial case study is a not an overly lengthy period to provide feedback to the human designer and opportunity for interaction. For the four cohesion metrics that take account of class size, the maximum fitness (‘bestso-far’) graphs of the evolutionary simulation runs are similar. A single run best-so-far graph for the COM.M+A metric for the Select Cruises case is shown in figure 11. The simulation runs also reveal that a single Pareto-optimal front is achieved for all cohesion metrics that take account of class size, while the COM metric converges on many fronts. A typical Paretooptimal front using the COM.M+A metric for Select Cruises is shown in figure 12.A significant factor determining the usefulness of the Pareto-optimal front is the integer scale reflecting the number of classes in the design solution. Manually examining the designs produced by the evolutionary runs, a difference is observed in the design solutions produced by the four metrics that account for class size, when compared the metric that does not. The design solutions produced by the COM metric include a large proportion of classes containing one attribute and one method. To the designer’s eye, it appears that some of the ‘one plus one’classes may be coalesced to produce a class closer to the manual designs. Metrics that account for size identify classes of high cohesion such as ‘Customer Details’, ‘Cruise’, ‘Passage’, and ‘Credit Card Detail’ but struggle to identify other classes that exhibit lower cohesion.

Figure 11.

Select Cruises ‘best-so-far’ graph, single run.

Downloaded By: [University of the West of England] At: 14:57 19 June 2007

646

C. L. Simons and I. C. Parmee

Figure 12.

Select Cruises Pareto-optimal front.

Interestingly, in the Select Cruises case study, a number of solution variants emerge that produce classes of high cohesion fitness not suggested in the manual design. For example, the attribute ‘DateTransacted’ is grouped in the ‘Sale’ class while retaining a COM value of 1.0 for the ‘Sale’ class. This might suggest to the human designer that the manually identified class of ‘Transaction’, without methods, is a poor candidate class. Either additional methods are required for the ‘Transaction’ class, or the class might be removed from the design. In another example, the attributes ‘DateBerthReserved’ and ‘DateBerthBooked’ are grouped in the ‘Berth’ class in one solution variant. Although the COM value of the ‘Berth’ class drops from 0.83 to 0.6 with this grouping, a class of zero cohesion is eliminated from the design.

5.

Limitations

From the results produced for the two case studies, it is evident that the generation of classes within evolutionary design solutions is limited by two factors. Firstly, the specification of the design problem via use cases is crucial as the representation of the design solution can only be as effective as the fidelity of the representation of the design problem. Although the representations of the design problem and solution are highly traceable, expression of the steps in the use case narrative influences the design solution. Furthermore, some authors (e.g. WirfsBrock and McKean 2003) imply that use cases may not provide a complete basis of an object-oriented class design in that a human designer, through mental conceptualizations, may arrive at valid abstractions that have no systematic relationship to the specified design problem. The representation described in this article does not cater for such inventions, although it is envisaged that human interaction with search and exploration tool support will afford such opportunities. Secondly, it is clear that while the cohesion metrics investigated in this article have produced interesting cohesive class design solutions, they are by no means a complete reflection of the inherently multi-objective evaluations conducted by a human designer. The evolutionary design variants produced are thus highly dependent on the extent and choice of metrics employed during search and exploration. Although class cohesion and size are used in the multi-objective GA, future incorporation of a coupling metric to address class relationships (such as association and inheritance) is envisaged.

Downloaded By: [University of the West of England] At: 14:57 19 June 2007

Cross-disciplinary technology transfer for search-based evolutionary computing

6.

647

Conclusions

For both the single and multi-objective GAs investigated, the conceptual design solutions produced both show a broad correspondence to the manual conceptual designs with respect to cohesion results. However, the conceptual design solutions produced by the multi-objective GA visually to better correspond appear to better visually correspond to the manually produced designs, since the cohesion fitness measure takes account of class size and the evaluation of solution variants is multi-objective. Furthermore, the multi-objective search produces a variety of interesting design variants not reached at by manual design. Indeed, a large number and variety of design solution variants of equivalent optimality are produced. We conclude that this is significant as this is a typical characteristic of search results obtained by EC approaches applied in conceptual engineering design. Conceptual software designs stimulate the designer to understand how the class structure of the software-to-be reflects the design problem. We speculate that this understanding may be greatly enhanced by supporting the designer with the presentation of alternative conceptual designs of equivalent optimality produced by evolutionary search. Where manual and machinegenerated designs of equivalent optimality differ, this is useful and interesting to the designer in that it stimulates greater understanding of the design problem. Indeed, conceptual designs of equivalent optimality are rarely right or wrong in an absolute sense, but are more or less useful to the designer. We conclude that use of cohesion and class size as objective fitness functions within the multi-objective GA produce promising results in terms of useful and interesting conceptual class designs. Results of investigations of the GAs in this article also indicate that the execution times of the evolutionary design simulations for non-trivial design problems appear favorable. Taking favorable performance together with the generation of useful and interesting conceptual class designs, we also conclude that this approach, inspired by engineering design search approaches, may be taken forward as the basis of computational tool support for interactive human/machine search and exploration of the conceptual software design space.

References Apperley, H., Hofman, R., Latchem, S., Maybank, B., McGibbon, B., Piper, C. and Simons, C.L., Service- and Component-based Development, 2003 (Addison-Wesley: Reading, MA). Briand, L.C., Daly, J.W. and Wust, J.K., A unified framework for coupling measurement in object-oriented systems. IEEE Trans. Softw. Eng., 1999, 25(1), 91–121. Brooks, F.P. Jr., The Mythical Man Month, 20th anniversary edn, 1995 (Addison-Wesley: Reading, MA). Chidamber, S.R. and Kemerer, C.F., A metrics suite for object-oriented design. IEEE Trans. Softw. Eng., 1994, 20(6), 476–493. Cockburn, A., Writing Effective Use Cases, 2001 (Addison-Wesley: Reading, MA). Deb, K., Multi-objective Optimization Using Evolutionary Algorithms, 2001 (Wiley: Chichester). Fogel, L.J., Owens, A.J. and Walsh, M.J., Artificial Intelligence Through Simulated Evolution, 1966 (Wiley: Chichester). Glass, R.L., Facts and Fallacies of Software Engineering, 2003 (Addison-Wesley: Reading, MA). Goldberg, D.E., Genetic Algorithms for Search, Optimization, and Machine Learning, 1989 (Addison-Wesley: Reading, MA). Guindon, R., Designing the design process: exploiting opportunistic thoughts. Human–Comput. Interact., 1990, 5(2–3), 305–344. Harrison, R., Councell, S. and Nithi, R., An investigation into the applicability and validity of object-oriented design metrics. Emp. Softw. Eng., 1998, 3(3), 255–273. Harman, M. and Jones, B., Search-based software engineering. Inf. Softw. Technol., 2001, 43(14), 833–839. Jacobson, I., Christerson, M., Jonsson, P. and Overgaard, G., Object-oriented Software Engineering: A Use Case Driven Approach, 1992 (Addison-Wesley: Reading, MA). Khoshgoftaar, T.M., Liu, Y. and Seliya, N., A multiobjective module-order model for software quality enhancement. IEEE Trans. Evol. Comput., 2004, 8(6), 593–608.

Downloaded By: [University of the West of England] At: 14:57 19 June 2007

648

C. L. Simons and I. C. Parmee

Liu, D., Subramaniam, K., Eberlien, A. and Behrouz, H., Natural language requirements analysis and class model generation using UCDA, in Proceedings of the 17th International Conference on Industrial and Engineering Application of Artificial Intelligence and Expert Systems (IEA/AIE 2004), 2004, pp. 295–304. Lo, S.-C. and Chang, J.-H., Application of clustering techniques to software component architecture design. Int. J. Softw. Eng. Knowledge Eng., 2004, 14(4), 429–439. Mich, L. and Garigliano, R., NL-OOPS: a requirements analysis tool based on natural language processing, in Proceedings of the Third International Conference on Data Mining (Data Mining III), 2002, pp. 321–330. Parmee, I.C., Evolutionary and Adaptive Computing in Engineering Design, 2001 (Springer: Berlin). Parmee, I.C., Improving problem definition through interactive evolutionary computation. Artif. Intell. Eng. Des. Anal. Manuf., 2002, 16(3), 85–202. Parmee, I.C., Human-centric intelligent systems for exploration and knowledge discovery. Analyst, 2005, 130(1), 29–34. Seng, O., Bauer, M., Beil, M. and Pache, G., Search-based improvement of subsystem decompositions, in Proceedings of the Genetic and Evolutionary Computing Conference 2005 (GECCO ’05), 2005, pp. 1045–1051. Simons, C.L., Use Cases for Cinema Booking System, 2005(a). Available online at: www.cems.uwe.ac.uk/∼ clsimons/CaseStudies/CinemaBookingSystem.htm Simons, C.L. Use Cases for Select Cruises, 2005(b). Available online at: www.cems.uwe.ac.uk/∼ clsimons/CaseStudies/SelectCruises.htm Simons, C.L. and Parmee, I.C., Defining the Search Space for Conceptual Software Designs, 2005. Available online at: www.cems.uwe.ac.uk/ ∼clsimons/Publications/Definingthesearchspace.pdf Simons, C.L., Parmee, I.C. and Coward, D.P., 35 years on: to what extent has software engineering design achieved its goals? IEE Proc. –Softw., 2003, 150(6), 337–350. Svetinovic, D., Berry, D.M. and Godfrey, M., Concept identification in object-oriented domain analysis: why some students just don’t get it, in Proceedings of the 13th International Conference of Requirements Engineering (RE ’05), 2005, pp. 189–198. Wirfs-Brock, R.J. and McKean, A., Object Design: Roles, Responsibilities and Collaborations, 2003 (AddisonWesley: Reading, MA).

Suggest Documents