A Comparison of Two Memetic Algorithms for Software Class Modelling Jim Smith
Christopher Simons
Department of Computer Science and Creative Technologies University of the West of England Bristol, UK
Department of Computer Science and Creative Technologies University of the West of England Bristol, UK
[email protected]
[email protected]
ABSTRACT
Keywords
Recent research has demonstrated that the problem of class modelling within early cycle object orientated software engineering can be successfully tackled by posing it as a search problem to be tackled with meta-heuristics. This “Search Based Software Engineering” approach has been illustrated using both Evolutionary Algorithms and Ant Colony Optimisation to perform the underlying search. Each has been shown to display strengths and weaknesses - both in terms of how easily “standard” algorithms can be applied to the domain, and of optimisation performance. This paper extends that work by considering the effect of incorporating Local Search. Specifically we examine the hypothesis that within a memetic framework the choice of global search heuristic does not significantly affect search performance, freeing the decision to be made on other more subjective factors. Results show that in fact the use of local search is not always beneficial to the Ant Colony Algorithm, whereas for the Evolutionary Algorithm with order based recombination it is highly effective at improving both the quality and speed of optimisation. Across a range of parameter settings ACO found its best solutions earlier than EAs, but those solutions were of lower quality than those found by EAs. For both algorithms we demonstrated that the number of constraints present, which relates to the number of classes created, has a far bigger impact on solution quality and time than the size of the problem in terms of numbers of attributes and methods.
Search-Based Software Engineering, Evolutionary Algorithms, Ant Colony Optimisation, Memetic Algorithms
1. INTRODUCTION Recent research has demonstrated that the task of class modelling within early cycle object orientated software engineering can be successfully tackled by posing it as a search problem. This task has also been referred to as the Class Responsibility Assignment Problem. For the sake of brevity we will hereafter refer to it as “class modelling”, with the restriction to the context of the early stages of the development life cycle being taken as read. The “Search Based Software Engineering” (SBSE) approach to class modelling has been illustrated using both Evolutionary Algorithms (EA) and Ant Colony Optimisation (ACO) to perform the underlying search. Each has been shown to display strengths and weaknesses - both in terms of how easily “standard” algorithms can be applied to the domain, and of optimisation performance. Interestingly both of these population based methods have been shown to outperform methods based on a single improving solution. It has been shown on a wide range of problems that the incorporation of local search as a mechanism for refining and improving candidate solutions can improve both the effectiveness (quality of solutions found) and efficiency (number of solutions examined) of population-based search heuristics [6]. This paper extends work on early-stage SBSE by considering the effect of incorporating Local Search. Specifically we examine the hypothesis that within a memetic framework the choice of global search heuristic does not significantly affect search performance. Our contention is that because of its complex, subjective nature, ultimately class modelling should be tackled via interactive search, albeit augmented with a surrogate fitness function to prevent user fatigue. Therefore ideally the choice of search method should consider the ease with which it can support users’ input into the search process via actions such as “freezing” parts of designs. These considerations are orthogonal to the comparative search performance, so understanding the latter is vital before we can disregard it. Simons and Smith [13, 14] have compared ACO and EAs as the basis for meta-heuristic search for class models, concluding that performance issues aside, there are practical reasons why one might prefer to use constructive heuristics such as ACO. For example, in an ACO “freezing” of partial solutions can be simply achieved via direct changes to
Categories and Subject Descriptors I.2.8 [Artificial Intelligence]: Problem Solving, Control Methods, and Search—Evolutionary Algorithms, Ant Colony Optimisation, Memetic Algorithms; D.2.3 [Software Engineering]: Design Tools and Techniques—Search Based Software Engineering, Class Modelling
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. GECCO’13, July 6–10, 2013, Amsterdam, The Netherlands. Copyright 2013 ACM 978-1-4503-1963-8/13/07 ...$15.00.
1485
the computational budget was reduced (as for example is often the case in interactive search) the situation was reversed. However, with both algorithms, and both representations examined, a major issue was dealing with the constraint that a valid class model should contain at least one attribute and at least one method. Those papers used penalty functions (all invalid models were given zero fitness) and the use of random regeneration of invalid solutions. Dealing more efficiently with this constraint would necessitate either a significant adaptation of the underlying global search heuristics, or the provision of a “repair” mechanism. A well designed Local Search algorithm that systematically examined the effect of moving elements between classes provides a simple way of providing the latter, and also of improving valid solutions, hence part of the motivation of this paper.
the pheromone table, whereas in an EA it would necessitate manipulation of the recombination and mutation operators on-the-fly. Some attempts have been made in this direction by interleaving phases of human manipulation and evolutionary adaptation [17]. Regardless of the mechanisms underpinning the search, clearly algorithmic simplicity cannot be a substitute for poor performance. Therefore this paper addresses the following questions: • Is there a difference in the performance of global search meta-heuristics when applied to class modelling? If so, on what sort of problems and why? • Does the addition of Local Search to create Memetic Algorithms improve performance, and if so is there still a difference in performance arising from the use of different global search heuristics?
3. REPRESENTATION AND ALGORITHMS
To answer these questions the rest of the paper is organised as follows: Section 2 provides a brief background to previous research in this problem domain. Section 3 describes the chosen representation and the meta-heuristics considered. Section 4 describes the experimental methodology used. Section 5 describes and analyses the results obtained. To conclude the paper, Section 6 summarises the findings and their implications for Search based Software Engineering in general, and early stage class modelling in particular.
2.
This section briefly overviews the way in which class modelling is posed as a search problem, and the three search algorithms used in this paper.
3.1 Representation The creation of class models can be thought of as a process of grouping elements (attributes and methods) into classes, so that every class contains at least one element of each type. In order to map readily onto freely available ACO source code, we borrow a representation commonly used for Vehicle Routing Problem. A solution is considered to be a permutation of a set of items, comprising the design elements plus “end-of-class” markers. If a problems has a attributes, and m methods to be grouped into c classes, the representation is then a permutation of size a + m + c − 1. The value of the ith position is taken to denote the element that occurs in the ith place in the tour, where the attributes are numbered from 1 to a, the methods from a + 1 to a + m and the “endof-class” markers from a + m + 1 to a + m + c − 1. When the representation of a solution is decoded to give the candidate class model, a simple array class = {1, . . . , c}a+m is constructed whose j th element gives the class (from 1 to c) of element j using the numbering scheme above for attributes and methods.
BACKGROUND
Search-Based Software Engineering (SBSE) is now a wellestablished discipline, wherein search has been applied across the range of the software development lifecycle [18]. Historically, comparatively little research focus has been directed to the upstream stages of the software design, although this is recently being addressed, typically using metrics relating to coupling and cohesion to guide meta-heuristic search of design spaces such as the object-oriented modelling of design classes. Bowman et al. used a multi-objective EA to produce designs optimising a number of pre-specified metrics [1]. Simons and Parmee [12, 11] applied interactive EAs, using linear regression to learn a surrogate fitness model that combined coupling with a number of “elegance metrics” to approximate users’ subjective preferences. Working slightly later in the development life-cycle, SieviKorte et al. used a memetic algorithm based on an EA [9], and Vathsavayi et al. interleaved human and evolutionary adaptation of the usage of patterns [17]. However, as the name of the SBSE field suggests, potentially any search algorithm could be used, although in practice research effort has also tended to concentrate on Evolutionary Algorithms. As Harmann [4] noted:
3.2 Evolutionary Algorithm The EA chosen is a standard generational genetic algorithm. Parents are chosen by deterministic binary tournaments, and elitism implemented so that the incumbent best solution is kept if not improved - replacing the least fit of the newly created offspring. We examined the use of both Edge Recombination and Order-Based Crossover. We do not provide a full algorithmic description of the EA as this is a well known algorithm with many descriptions online. Full details of the operators used can be found in e.g. [3]. There is now a significant body of evidence in favour of the use of self-adaptive mutation rates (see e.g. [7]), and also that for permutation based problems the “best” choice of mutation operator is not only problem, or even instancespecific, but in fact relates to the current state of the search problem [5, 8]. The operators we used were Swap, Insert and Invert operators. Following one of the schemes proposed in [8] each individual contained a single gene encoded for the mutation probability, and we adopted the following method for mutation:
We must be wary of the unquestioning adoption of evolutionary algorithms merely because they are popular and widely applicable or because, historically, other re- searchers have adopted them for SBSE problems; none of these are scientific motivations for adoption. Simons and Smith compared the use of ACOs and EAs for this problem [13, 14], concluding that given sufficient computational budget, global search via EAs was more effective at finding high quality solutions than that using ACO. When
1486
• With probability 0.2 the encoded value for mutation rate was changed to a new value, selected uniformly at random from the fixed set of allowed values.
We would like to gratefully thank the authors and maintainers for the public provision of the ACOTSP package version 1.02 [15]. The original code was modified to call the same fitness function designed for the EA for this application (see below) and the new Local Search algorithm implemented as described in the next section. Different numbers of ants were used as described below.
• This value was then multiplied by the representation length l, and the floor operator applied determine the number of mutations to apply. • One of the three mutation operators was then selected at random, and applied the number of times determined in the previous step.
3.4 Local Search Algorithm A single-step greedy local search algorithm was implemented following the algorithm described in Figure 1. The move operator considered moves one element to a different class. The only moves considered are those that create valid solutions, and where the “receiving” class has an attribute (or method) that is used by (uses) the incoming method (attribute). This last condition does not assure that the fitness of the neighbour is superior to that of the incumbent, but rules out evaluating some of the least fit neighbours - at least in terms of coupling.
A fixed set of ten mutation rates was used, defined in terms of the representation length l as 1/l·{0.001, 0.002, 0.01, 0.02, 0.1, 0.2, 1, 2, M IN (0.25l, 5), M IN (0.25l, 10)}. A range of population sizes were used as detailed below.
3.3 Ant Colony Algorithm Following some preliminary experimentation and guided by the findings reported in [13, 14] we applied a M AX − M IN ACO [16]. We here provide a brief description, considering the problem where the ACO is search a least-cost path through a set of nodes - for example the TSP. Full details may be found in Stuetzle’s papers referenced above. The ACO maintains a l × l “pheromone matrix” M , which reflects the search history, and a l × l matrix H of heuristic information, which between them define the probability distribution function for the generation of new solutions. In each iteration N ants are each placed at random at a starting node i : 1 ≤ i ≤ l and independently construct tours as follows:
BEGIN /* given a starting solution i */ set fi = calculate_f itness(i) ; set set of possible moves S = {} ; Decode i into a set of classes C; /* Build up a list of possible moves */ FOR EACH class c ∈ C DO IF (c contains more than one method) THEN FOR EACH method m ∈ c DO FOR EACH class k ∈ C, k = c DO IF m uses an attribute in k THEN set S = S ∪ {m, k}; FI OD OD FI IF (c contains more than one attribute) THEN FOR EACH attribute a ∈ c DO FOR EACH class k = c DO IF k contains a method that uses a THEN set S = S ∪ {a, k}; FI OD OD FI OD
• At each node, the ant creates a list S of all the as-yet unvisited nodes, the heuristic and pheromone values associated with the relevant links. • It then makes a selection from the list where the probability of moving from node i to node j is given by: 8 β α < P Mij ·Hij j∈S β α M Pmove (ij) = (1) j=i,j∈S ij ·Hij : 0 otherwise. After each ant has constructed a full solution its fitness f is measured. If best denotes the least cost path for generation t, and {ij} ∈ best taken to mean that edge ij is traversed in that path, then the pheromone matrix M is updated at the end of each generation according to: ( t (1 − ρ) · Mij + 1/fbest {ij} ∈ best t+1 Mij = (2) t (1 − ρ) · Mij otherwise.
/* Greedy search over the possible valid moves */ Randomise Order of S; set improved = F ALSE; WHILE ( S = {} AND improved = F ALSE ) DO take next possible move from S; apply move to create new solution j; set fj = calculate_f itness(j); IF ( f (j) < f (i)) THEN set i = j; set improved = T rue; FI ELSE remove move from S; ESLE OD END
The key factors which distinguish the MAX-MIN ACO variants are that the pheromone matrix M is initialised to its maximum value, is only updated with the information from the best ant per generation, and that the pheromone levels are truncated to lie within a pre-specified range to avoid over saturation. Preliminary experimentation revealed that the default recommended settings appeared to provide robust performance, with the exception that the influence of “heuristic information” was removed by setting the parameter β = 0.0. The rest of the relevant parameters were: α = 1 (exponent of pheromone used to calculate next-node probabilities), ρ = 0.02 (pheromone decay rate).
Figure 1: Pseudocode of the Local Search algorithm
1487
4.
METHODOLOGY
Table 1: Number of classes, Coupling (fCBO) and size-symmetry (fNAC) of Manual Software Designs Name Classes FCBO FNAC CBS 5 15.4 13.69 GDP 5 29.7 43.2 SC 16 45.2 25.33
4.1 Fitness and Performance Metrics For the initial experiments the fitness function used was based on the Coupling Between Objects measure. Based on the Use Cases provided in the documentation for the problems, and the numbering scheme outlined above, a l × l table U was constructed with: ( Uij =
1 0
instance, N , and algorithm, followed by post-hoc testing for significant differences using Tukey’s HSD test. Regardless of metric, we often present rankings in the format A < {B, C} < D, which should be taken to mean that the values for A are significantly lower with more than 95% confidence than those for B, C and D. B is lower, but not significantly so that C, and D is significantly higher than A, B and C.
i ≤ a, a < j ≤ a + m, method j uses attribute i otherwise.
The coupling-related fitness (to be minimised) is then defined as the percentage of all uses that are “out of class”: P P i j,class(j)=class(i) Uij P P (3) fCBO = 100 · i j , Uij
4.2 Problem Instances
It is well known that optimising merely to reduce coupling tends to produce so-called “god” classes [2] - that is to say designs in which most attributes and methods are grouped in a single class. In practice software designers deprecate this “anti-pattern”, as it tends to lead to low cohesion (amongst other problems). Several authors have pursued the concept of“design elegance” (in this and other fields) and a few metrics have recently been proposed that directly reflect design symmetry in terms of the distribution of attributes and methods [11]. One readily calculable metric which has been shown to correlate strongly with users’ recorded feelings of “elegance” is the Numbers Among Classes (NAC): the arithmetic mean of the standard deviation of the numbers of attributes and methods among the classes of a design. This was truncated to the range [0, 6] and a fitness to be minimised calculated as: 100 “ σmc σa ” fNAC = ∗ + c (4) 6 2 2 where mc and ac denote the numbers of method and attributes in class c. The lower this value, the more symmetrical the appearance of attributes and methods among the classes in the design, hence it tends to counterbalance the effect of the CBO metric. Thus in the second set of experiments we considered a combined fitness to minimise: fcomb = 0.5 ∗ (fCBO + fNAC )
To aid comparison with other published works, we used the three software design problems detailed in [10] which span a range of size and complexities. The first (CBS) is a generalized abstraction of a Cinema Booking System, with 16 attributes, 15 methods and 39 method/attribute uses. The second (GDP) is a university system for student records with 43 attributes, 12 methods and 121 method/attribute uses. The third (SC) is based on an industrial case study for booking cruise holidays with 52 attributes, 30 methods and 126 method/attribute uses. The previous authors provide results for manual designs which we reproduce in Table 1 with kind permission. Both of the representations chosen permit models with variable numbers of classes to be present, since“end-of-class” markers could be adjacent in a permutation. In this paper, to facilitate easy comparisons with the results produced manually (which for the SC represented several hours work) we imposed the condition that evolved designs should contain the same number of classes as the manual ones. Candidate solutions for the CBS and GDP problems are therefore required to have 5 classes. Noting that the higher numbers of classes in the SC problem will mean that a large proportion of candidate solutions will violate the constraint of at least one attribute and method per class, we have created three different versions of the SC problem to try and tease apart the effect of size and constraint violation. These use 5, 10, and 16 classes, and are denoted SC5, SC10 and SC16 respectively in the results below.
(5)
Given that ultimately we are concerned with the use of these search heuristics embedded in an interactive design tool, and that we do not know a “optimal” fitness for each instance, we have chosen performance metrics for the search heuristics that reflect the speed and reliability of the ability to find good solutions. Each time a candidate solution is evaluated (whether in local or global search) a counter is incremented, and for each run on each problem we note the best solution discovered, and after how many evaluations that occurred. Algorithms are then compared according to Mean Best Fitness (MBF) and AES (the mean number of evaluations to find the best solution discovered). For reasons of space and clarity we present the results graphically in the form of plots of mean performance over each algorithm-instance and number of individuals N. SPSS v20 was used to carry out a one-way Analysis of Variance with dependant variables MBF and AES and fixed factors
4.3 Algorithm Parameters The parameters specific to the ACO and EA are listed above. We ran experiments with 25, 50 and 100 individuals (i.e. members of the EA population, or ants respectively). The algorithm are hereafter referred to as aco, aco-LS, EAer, EAerLS, EAor and EAorLS where the lower case letters after “EA” denote the recombination operator used and the suffix LS denotes the memetic version. Each algorithm was run fifty times on each problem instance, with each run allowed to make 100,000 calls to the evaluation function.
5. RESULTS 5.1 Minimising fCBO Figure 2 shows a comparison of the mean best fitness per
1488
Analysing the performance on individual instances separately reveals that according to the combined metric, the ACO algorithms are worse on every problem. Again, results for ERor and EAorLS, particularly with larger population sizes, clearly show that the benefit in terms of both speed and quality for both fitness functions is more noticeable as the number of constraints increases. This supports our hypothesis that Local Search can act as a type of repair function for infeasible solutions.
run, separated by algorithm, problem instance, and number of individuals. The statistical analysis for MBF reveals ranking: EAorLS < EAor < {EAer, acoLS, EAerLS} < {acoLS, EAerLS, aco}. The obvious conclusion from this is that overall the Order based crossover is preferable to Edge Recombination, and that apart from EA using order crossover, the addition of Local Search does not provide a significant improvement in the MBF performance for coupling-based fitness. A visual comparison of the results for SC5, SC10 and sC16 appears to confirm our hypothesis that it is the level of constraints, rather than necessarily the sheer scale of the problem, which causes a deterioration in the search process. Looking at the mean values for each algorithm there is a clear pattern that the best coupling achieved increases as the number of classes required increases. Intuitively this makes sense - if we optimise a design with c classes solely according to coupling, the system tends to produce 1 god class and c − 1 much smaller classes with only one or two methods and attributes in each. If we increase c, then there can necessarily be fewer elements in the god class and so the design almost certainly has more coupling. However it is notable that the EA variants also show an increase in the variability - for example the confidence intervals for SC16 are far larger than for the other instances, and considering this problem alone the performance of all algorithms is not statistically significantly different. In other words, the ACOs are less affected by constraints, but at the expense of missing the occasional discovery of good solutions which characterises the EAs, especially EAor and EAorLs. Analysing MBF by Instance reveals the significantly different subgroups are: CBS < SC5 < {GDP, SC10} < SC16. Analysing the results by N revealed that the algorithms with 25 individuals gave the best performance, and N = 50, = 100 were not significantly different. Analysis of the number of evaluations taken to find the best per run is illuminating. Figure 3 shows the mean number of evaluations per run after which the best individual was located. This reveals that this happens sooner for aco and acoLS, and that whereas the addition of local search speeds the discovery of good solutions for the EAor (except on SC5) it delays the process for EAer. This is backed up by the statistical analysis, which discriminates different subsets in the increasing order as : {acoLS, aco} < EAorLs < EAor < ERer < EAerLS. Analysis by instance shows {CBS, GDP } < SC5 < SC10 < SC16. This completes the picture we observed above that increasing the number of attributes and methods, and hence the representation length does not necessarily increase the effectiveness of search (MBF) but does decrease its efficiency (AES). On the other hand, introducing constraints by requiring more classes impacts on both measures. Analysis by N shows the pattern 25 < 50 < 100.
5.3 Further experiments with ACO The results above clearly show that for both fitness functions the ACO algorithms do not find as good solutions, but they find their best solutions much earlier in the run than their EA counterparts. One obvious reason could be that is due to premature convergence, especially as the MAX-MIN algorithm used only updates the pheromone trail with information from the best ant per iteration. To examine this we re-ran the ACO experiments using both higher (ρ = 0.2 )and lower (ρ = 0.002) rates of evaporation, than its default value of 0.02. The MBF results with both values were statistically significantly worse than those with ρ = 0.02, and the time taken to find the best solutions was not significantly different either. Another possibility is that the problem arise because the values returned by out fitness functions lie in the range 0-100, typically 20-30, which is far lower than the values typically seen in TSP problems for which the software is designed. This would mean that quantities of pheromone added to links on the best ants trail was relatively large, causing the pheromone matrix to “converge” fairly quickly to a set of values which represent a single path. This can be addressed in two ways - either by scaling the rate at which pheromones are laid down, or by using an exponent of less than one when calculating the probability of selecting different links during path construction. Accordingly we ran experiments with ρ = 0.2 and α = 0.7. These similarly failed to produce improvements in the performance relative to EAs. Since both the ACO and the EA use exactly the same code (indeed are compiled against the same files) to implement the Local Search and the fitness functions, it would appear that we can rule out implementation errors. There is extensive literature reporting the successful and competitive use of ACO for permutation-type problems. The final reason for the relatively poor performance of the many different parameterisations of MAX-MIN ACO tried would thus appear to be that the algorithm is designed to make extensive use of heuristic information, which was not available for these experiment as we wished to make a fair comparison with the EAs.
6. CONCLUSIONS The results presented above show that in fact the use of local search is usually, but not always beneficial to the Ant Colony Algorithm, whereas for the Evolutionary Algorithm using an order based recombination it is highly effective at improving both the quality and speed of optimisation. Across a range of parameter settings ACO found its best solutions earlier than EAs, but those solutions were of lower quality than those found by EAs. For both algorithms we demonstrated that the number of constraints present, which relates to the number of classes
5.2 Minimising fcomb Figures 4 and 5 show the MBF and AES performance for the fitness function combining coupling and elegance, measured by fCBO and fNAC respectively. These show a very similar pattern of behaviour to above. The overall ranking by MBF is {EAorLs, EAor} < {EAer, EAerLS} < acoLs < aco, so for these fitness landscapes the addition of Local Search aids the aco algorithm. The ranking by AES is {acoLS, aco} < EAorLS < EAor < EAer < EAerLS.
1489
40
Algorithm
CBS
30 20
aco aco-LS EAer EAerLS EAor EAorLS
10 0 40
GDP
30 20
0 40
20 10
Inst
30
SC5
95% CI Fcbo
10
0 40
SC10
30 20 10 0 40
SC16
30 20 10 0 25
50
100
N
100000 80000 60000 40000 20000 0 100000 80000 60000 40000 20000 0 100000 80000 60000 40000 20000 0 100000 80000 60000 40000 20000 0 100000 80000 60000 40000 20000 0
Algorithm
CBS
aco aco-LS EAer EAerLS EAor EAorLS
GDP Inst
SC5 SC10
95% CI EV(Best Fcbo)
Figure 2: Comparison of Mean Best Fcbo values analysed by algorithm, instance and N
SC16
25
50
100
N
Figure 3: Comparison of Number of Evaluations after which Best Fcbo value per run was discovered analysed by algorithm, instance and N .
1490
40
Algorithm
CBS
30 20
aco aco-LS EAer EAerLS EAor ERordLS
10 0 40
GDP
30 20
0 40
20 10
Inst
30
SC5
95% CI Fcomb
10
0 40
SC10
30 20 10 0 40
SC16
30 20 10 0 25
50
100
N
100000 80000 60000 40000 20000 0 100000 80000 60000 40000 20000 0 100000 80000 60000 40000 20000 0 100000 80000 60000 40000 20000 0 100000 80000 60000 40000 20000 0
Algorithm
CBS
aco aco-LS EAer EAerLS EAor ERordLS
GDP Inst
SC5 SC10
95% CI Ev(Best Fcomb)
Figure 4: Comparison of Mean Best Fcomb values analysed by algorithm, instance and N
SC16
25
50
100
N
Figure 5: Comparison of Number of Evaluations after which Best Fcomb value per run was discovered analysed by algorithm, instance and N .
1491
[7] S. Meyer-Nieberg and H.-G. Beyer. Self-adaptation in evolutionary algorithms. In F. G. Lobo, C. F. Lima, and Z. Michalewicz, editors, Parameter Setting in Evolutionary Algorithms, volume 54 of Studies in Computational Intelligence, pages 47–75. Springer, 2007. [8] M. Serpell and J. Smith. Self-Adaption of Mutation Operator and Probability for Permutation Representations in Genetic Algorithms. Evolutionary Computation, 18(3):1–24, Feb. 2010. [9] O. Sievi-Korte, E. M¨ akinen, and T. Poranen. Simulated Annealing for Aiding Genetic Algorithm in Software Architecture Synthesis. Technical Report D-2010-019, Department of Computer Sciences, University of Tampere, Jan. 2010. [10] C. Simons. Case study specifications, available at http://www.cems.uwe.ac.uk/˜clsimons/casestudies. [11] C. Simons and I. Parmee. Elegant object-oriented software design via interactive, evolutionary computation. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 42(6):1797–1805, 2012. [12] C. Simons, I. Parmee, and R. Gwynllyw. Interactive, Evolutionary Search in Upstream Object-Oriented Class Design. Software Engineering, IEEE Transactions on, 36(6):798–816, 2010. [13] C. Simons and J. Smith. A Comparison of Evolutionary Algorithms and Ant Colony Optimization for Interactive Software Design. In Proceedings of the 4 th Symposium on Search Based-Software Engineering, page 37, 2012. [14] C. L. Simons and J. E. Smith. A Comparison of Meta-heuristic Search for Interactive Software Design. arXiv preprint arXiv:1211.3371, accepted for publication in the journal Soft Computing, pages 1–32, Oct. 2013. [15] T. Stuetzle. Acotsp, version 1.2, available at http://www.aco-metaheuristic.org/aco-code. [16] T. Stuetzle and H. H. Hoos. Max-min ant system. Future Generation Computer Systems, 16(8):889–914, 2000. [17] S. Vathsavayi, H. Hadaytullah, and K. Koskimies. Interleaving human and search-based software architecture design. Proceedings of the Estonian Academy of Sciences, 62(1):16, 2013. [18] Y. Zhang. Repository of publications on search-based software engineering.
created, has a far bigger impact on solution quality and time than the size of the problem in terms of numbers of attributes and methods. The analysis of the different parameters for the ACO also revealed that the reason for the relatively poor performance probably lies with its reliance on heuristic information to complement that stored in the pheromone matrix. In this paper we chose not to use such information, since it was not available to the Evolutionary Algorithm, but it would be relative straightforward to construct the equivalent of a distance matrix H (see Eq.1). A simple method would be to measure distance via use patterns - for example setting Hij = 1 if method i used attribute j, Hij = 2 if methods i and j shared a common attribute k, or if attributes i and j were both used by method k, and so on. The original hypothesis of this paper was that the addition of Local Search to create memetic algorithms would to a large extent reduce the impact of the choice of global search heuristic. Certainly for the single-step local search method used here, the results conclusively disprove that hypothesis. While this is a shame from the perspective of easily incorporating user-provided hints etc, the reliance that we have exposed on heuristic information for the ACO algorithm does at least point a way forward.
7.
REFERENCES
[1] M. Bowman, L. C. Briand, and Y. Labiche. Solving the Class Responsibility Assignment Problem in Object-Oriented Analysis with Multi-Objective Genetic Algorithms. IEEE Transactions on Software Engineering, 36(6):817–837, Nov. 2010. [2] W. Brown, R. Malveau, H. McCormick, and T. Mowbray. Anti-Patterns: Refactoring Software, Architectures, and Projects in Crisis. Wiley, 1998. [3] A. E. Eiben and J. E. Smith. Introduction to evolutionary computing. Springer: Heidelberg, Berlin, New York, 2003. [4] M. Harman. Software Engineering Meets Evolutionary Computation. Computer, 44(10):31–39, 2011. [5] N. Krasnogor and J. Smith. Emergence of profitable search strategies based on a simple inheritance mechanism. Proceedings of the Genetic and Evolutionary Computation Conference, GECCO-2001, pages 432–439, 2001. [6] N. Krasnogor and J. Smith. Competent memetic algorithms: model, taxonomy and design issues’. IEEE Transactions on Evolutionary Computation, 9:474–488, 2005.
1492