Combining Search Space Diagnostics and Optimisation I. Moser Swinburne University of Technology Melbourne, Australia Email:
[email protected]
Abstract—Stochastic optimisers such as Evolutionary Algorithms outperform random search due to their ability to exploit gradients in the search landscape, formed by the algorithm’s search operators in combination with the objective function. Research into the suitability of algorithmic approaches to problems has been made more tangible by the direct study and characterisation of the underlying fitness landscapes. Authors have devised metrics, such as the autocorrelation length, to help define these landscapes. In this work, we contribute the Predictive Diagnostic Optimisation method, a new local-searchbased algorithm which provides knowledge about the search space while it searches for the global optimum of a problem. It is a contribution to a less researched area which may be named Diagnostic Optimisation.
I. I NTRODUCTION The notion of a fitness landscape in optimisation - used interchangeably with the term search space in this work originates in the work of Wright [31] who depicted genetic neighbourhoods as contour maps with peaks and valleys. Fitness landscapes arise from three components [11]; a set of possible solutions, an objective function which determines the quality or fitness of these solutions and a distance measure defined by the number of ‘steps’ or neighbourhood moves which separate one solution from another. Weinberger [28] defines a fitness landscape as ‘rugged’ when there is little correlation between the fitnesses of neighbouring solutions. Stochastic optimisation relies on the existence of gradients toward optima, local or global. Following the minimisation paradigm, these gradients, whose length can be measured using autocorrelation [28], form ‘basins’ with an optimum at the bottom. Intuitively, the fewer basins the search landscape has, and the more pronounced these are, the better we can expect a local search to perform. In this work, we employ a localsearch-based novel stochastic optimiser to gather information about these basins in the search landscapes while searching for the optimum. The empirical evaluation is based on several well-known instances of the Quadratic Assignment Problem (QAP), and a neighbourhood defined by the transposition (swap) operator, where two solutions are neighbours when they differ in two allocations. II. C HARACTERISING F ITNESS L ANDSCAPES Since the aim of this work is to evaluate a new means of describing features of the search space, one of the main tasks
M. Gheorghita Swinburne University of Technology Melbourne, Australia
[email protected]
is to show whether the findings of the new method coincide with existing characterisation techniques. Only a few metrics have been found to describe aspects of fitness landscapes. Autocorrelation and its derivate correlation length [28] were arguably the first descriptive metric, followed by Fitness Distance Correlation (FDC) [15] and Elementary Landscapes (ELA) [26]. Where possible, the respective measures have been calculated for the QAP problems used for this experimental work to investigate how they characterise the features of the fitness spaces. A. Autocorrelation Given a number s of independent walks through the solution space with a predefined number of k neighbourhood steps each, the autocorrelation is calculated as given in eq. 1.
ρ(k) =
1 s
Ps
i=0
(f (x0 ) − f (x0i ))(f (xk ) − f (xki )) σf (x0 ) σf (xk )
(1)
Here, over several neighbourhood move sequences, the solution x0 is always the initial random solution, f (x0 ) its fitness value. The solution xk is the final solution after k steps. The averages over the objective function values for the initial x0 and final xk solutions are f (x0 ) and f (xk ) respectively, and σf (x0 )/σf (xk ) capture the standard deviation of x0 and xk respectively. The autocorrelation function for a single step ρ(1) describes the ‘nearest neighbour correlation of the landscape’ [26]. The larger ρ(1), the larger the fitness difference of a move in a defined neighbourhood. Based on the autocorrelation function for a single step, the autocorrelation length l [26] can be calculated (eq. 2. l=−
1 ln(|ρ(1)|)
(2)
Values closer to 1 indicate a strong correlation, ρ of 0 indicate an absence of a correlation. Angel [2] found that the larger the autocorrelation length l, the better suited the problem is for local search. B. Fitness Distance Correlation Jones [15] first devised the FDC as a measure of problem difficulty for Genetic Algorithms. Its informative value has
been the subject of some debate [24], [1]. Authors usually agree on the FDC being a useful measure of problem difficulty for a particular algorithm when the distance metric is aligned with the search step of the algorithm [15], [1]. Ps
+ ∗ + ∗ (f (x+ ) − f (x+ i ))(d(x , x ) − d(x , xi )) σf (x+ ) σd(x∗ ,x+ ) (3) Eq. 3 describes the calculation of the FDC, which is designed to establish the correlation of local (x+ ) and global (x∗ ) optima with respect to their fitnesses (f (x+ )) and their distances to the global optimum d(x∗ , x+ i ). Values close to unity are indicative of a positive correlation, i.e. the fitnesses of the local optima increase with increasing proximity to the global optimum. FDC values around 0.0 are considered difficult problems with no correlation between local and global optima. Negative FDC indicates that the quality of the local optima increases with increasing distance to the global optimum. To measure the FDC for the current QAP instances, the Hamming distance between the known global optimum and the 50 local optima sampled has been calculated using the difference in assignments to locations. The 50 local optima were found using steepest descent (SD).
CF D =
i=0
C. Elementary Landscapes As the notion of a landscape results from the concept of a neighbourhood, which in turn is defined by the chosen local search operator, some authors have analysed and compared the landscapes evoked by varying search operators to investigate the most successful search moves for given problems. Grover’s [10] wave equation is at the origin of this research, which identifies regular and symmetric neighbourhoods which lend themselves to local search. Neighbourhoods are said to be regular if the number of neighbouring solutions is identical for every solution, symmetric if the neighbourhood operator permits the move from solution x to y as well as from y to x. Stadler [26] subsequently coined the term elementary for landscapes which have these properties and obey Grover’s wave equation. Especially the diverse neighbourhoods of the Travelling Salesperson Problem (TSP) have been studied extensively in the context of elementary landscapes [10], [29], [30], [6], [3] If a landscape is elementary, all local optima are on the superior side of the mean of the fitness [10]. Codenotti and Margara [6] observed that the fitness of any neighbour of a solution x can be expected to lie between f (x) and the true mean of the fitness f . Once the elementary nature of the landscape of an operator on a problem has been established, it is possible to estimate the fitness of a neighbouring solution using eq. 4 or a neighbourhood estimate introduced by Whitley [30]. One of the drawbacks implied by Codenotti and Margara’s [6] observation is that the expected fitness improvement brought about by a neighbouring solution will be different depending on the current solution’s location in the search space. The correctness of the expected value depends
largely on the statistical isotropy of the landscape. Also, it is usually impossible to calculate the exact mean fitness f . Whitley and Sutton [29] have provided a decomposition of Grover’s wave equations suited to analyse the neighbourhoods of problems (eq. 4). It describes the effect of one defined neighbourhood move on the fitness of the solution. E[f (y)] = f (x) − p1 f (x) + p2 (
1 f − f (x)) p3
(4)
For the given QAP problem and the chosen transposition move, the ratio p1 = n2 quantifies the change in existing assignments; 2 in n assignments that comprise the solution have been unassigned for the change. Two new assignments out of a total n2 −n (reducing for the n assignments that form part of the initial solution x) possible assignments have been substituted, hence p2 = n22−n ; and a total of n assignments, out of n2 constitute the solution, represented by p3 = nn2 . Substituting these in the above equation, and applying simple algebra, we obtain eq. 5. E[f (y)] = f (x) +
2n2 (f − f (x)) − n)n
(5)
(n2
2
This yields a neighbourhood size of d = n 2−n and a constant factor k = n. According to Barnes et al. [4], for smooth elementary landscapes, the ratio kd takes values between 0 and 1. Values above unity are indicative of landscape ruggedness. Given the outcomes listed in table I, the transposition operator is well suited for the QAP, as the resulting landscape is smooth. D. Characterising Example Problems TABLE I: Common fitness landscape metrics (autocorrelation with steepest descent and random walks, fitness distance correlation and elementary landscape analysis) applied to the example QAP instances used. Problem
Chr20a Chr25a Kra30a Kra32 Nug20 Nug30 Ste36a Ste36b Tai20a Tai30b Tai100b
SD k ≈ 10 0.09 0.24 0.34 0.37 0.21 0.36 0.35 0.37 0.13 0.05 0.02
Autocorrelation Random Walk k = 10 k=1 0.14 0.83 0.17 0.79 0.27 0.77 0.41 0.92 0.14 0.88 0.27 0.88 0.43 0.78 0.47 0.84 0.15 0.79 0.27 0.92 0.75 0.95
l 5.33 4.16 3.76 12.58 7.49 7.50 3.94 5.84 4.31 11.86 23.27
ELA
FDC
k d
CF D 0.03 0.07 0.30 0.03 0.29 0.19 0.25 0.22 0.03 0.24 0.46
0.11 0.08 0.07 0.06 0.11 0.07 0.06 0.06 0.11 0.07 0.02
1) Chr20a and Chr25a: The Chr20a and Chr25a instances are known to be among the hardest of the Chr* problems. Christofides and Benavent [5] randomly created these QAP problems with flow matrix structures that vary between the extremes of ‘bushy trees’ and ‘straight lines’. The authors found that the problems at the extreme ends (entirely bushy or entirely elongated) are easy to solve whereas the intermediate structures are difficult. Chr20a has 7 end nodes (nodes with a degree of one), Chr25a has 13 of these single-degree nodes.
Optima
Fig. 1 shows that these problems have the largest spread of fitness among the problem instances, suggesting a landscape with shallow as well as deep troughs. According to the FDC, the quality of these optima does not correlate to the proximity to the global optimum. 2) Kra30a and Kra32: Kra30a is a hospital design problem with 256 known global optima formulated by Krarup and Pruzan [18] in 1978. The Kra32 instance is based on Kra30a to which two facilites have been added to complete the grid representation. All Kra* instances are variations of the same hospital and can therefore be expected to be homogeneous and very similar. The distribution of local optima shown in fig. 1 is almost identical and supports the background knowledge we have of these instances. Surprisingly, the autocorrelation lengths for Kra30a and Kra32 are almost at opposite ends of the possible range, indicating that Kra32 has solutions which are correlated even when 12 neighbourhood steps apart. The FDC provides some indication that some of the local optima of Kra30a have higher fitnesses as the distance to the global optimum decreases, whereas Kra32 shows no correlation. 3) Nug20 and Nug30: These instances of the problems by Nugent [22] are variations of a rectangular factory layout with 20 and 30 departments respectively. Therefore, a homogeneous search space structure is likely. The search space metrics as well as the distribution of optima (fig. 1) underpin the notion of a homogeneous ‘basin’ structure. The correlation length is almost identical, only over larger distances does Nug20 seem slightly less correlated. The FDC, however, shows a slightly stronger correlation between the local and global optima for Nug30 than for Nug20. 4) Ste36a and Ste36b: These instances by Steinberg [27] are variations of a computer component placement problem with a view to minimise the lengths of the connecting wires. Since computer components have different sizes, the search space structure may be diverse. The box plot of the instances’ local optima (fig. 1) show that while both instances have a large spread in the fitnesses of the local optima, the range of Ste36b is approximately twice as large. All autocorrelation metrics, however, show that Ste36b is more correlated than Ste36a, whereas the FDC indicates an almost equal though not very strong correlation between local and global optima for both Ste** problems. 5) Tai20a, Tai30b and Tai100b: Tai100b has previously been found to have a high fitness distance correlation, indicating a structured fitness landscape [20], a finding corroborated by our FDC results. All Tai**b problems are deemed to exhibit correlated local optima [20], but according to our FDC results, tai30b ranks considerably lower in this scale than Tai100b. In general, Tai**b instances are considered structured, whereas Tai**a are considered rugged [20]. This description is supported by the landscape metrics: Tai20a is among the least correlated of all problem instances, whereas Tai30b and Tai100b have very high correlation length. However, the local optima of Tai30b are of a very high diversity, which may account for a lower FDC value.
Chr20a Chr25a Kra30a
Kra32
Nug20
Nug30
Ste36a
Ste36b Tai100b Tai20a
Tai30b
Problem
Fig. 1: Distribution of optima in different problems. Values have been normalised for comparability. Larger fitness values are superior.
III. S TOCHASTIC O PTIMISATION AND L ANDSCAPE A NALYSIS Stochastic black-box approaches have been used to find optima to combinatorially complex NP-hard problems for decades. They are essentially a random search with a bias towards creating candidate solutions which, based on previous feedback from the objective function, appear likely to score a high fitness value. This adjustment of the search can be seen as an implicit learning mechanism. In the domain of discrete combinatorial optimisation, which is the focus of this research, Genetic Algorithms (GA) [9] and Simulated Annealing (SA) [17] are examples of the oldest and most wellestablished approaches. GAs acquire their search bias through a selection process which keeps a population of solutions with high fitness values, whereas SA searches in the neighbourhood of a single solution by narrowing the acceptance range of new solutions over time. Other approaches, such as Ant Colony Optimisation (ACO) [8] or Estimation of Distribution (EDA) [12] algorithms are based on the paradigm of deriving probabilities of creating high-quality solutions from the structure and fitness of solutions created earlier. The desired search bias is therefore made explicit in the probability matrices maintained. Numerous researchers have enhanced these basic techniques by enhancing the bias adjustment of the search which is typical of stochastic optimisation [7], [19], [23]. Surprisingly, researchers in stochastic optimisation do not usually show an interest in collecting information about the problem or its fitness space which can be made available to the practitioner. The underlying opinion in the stochastic optimisation field seems to be that these black-box approaches should provide acceptable solutions without the user knowing about the search landscape. Paradoxically, a large share of
the publications in evolutionary computation and related fields concentrate on comparing the suitability of different blackbox approaches for different problem spaces. The comparison of applicability of GAs and Local Search is only one of many examples [16], [25], [21]. Since the commonalities of the problems investigated remain elusive, this work has been painstaking due to a lack of generalisability. The algorithm proposed in this work is best categorised as Diagnostic Optimisation, a group of approaches which combine fitness landscape diagnostics with a black box search. Among the existing black-box approaches, EDA has the best potential for adaptation to suit the characteristics of Diagnostic Optimisation. Most EDA approaches produce a probabilistic model as a by-product of the search. Hauschild et al. [14], [13] capture the probabilistic models produced by the hierarchical Bayesian Optimisation Algorithm (hBOA) when optimising Ising Spin glasses, trap-5 and the MAXSAT problem. Their goal is to use the resulting distributions to prime the optimisation process of other instances of the same problems, which leads to a significant increase in the speed of the search. Ideally, however, Diagnostic Optimisation algorithms should give an indication which search algorithms or neighbourhood moves would suit the problem best. It would also be desirable for a good optimiser to diagnose when the search space is too easy for a stochastic solver; for instance, the solution space may be unimodal, and a black-box approach is a wasteful technique to optimise it. A. Predictive Diagnostic Optimisation Knowing the structure of a problem’s fitness landscape can ultimately lead to the identification of ideal or near-ideal approaches to solving the problem. Performing a random walk (regardless of bias applied) in a fitness landscape simply to characterise its structure to benefit future optimisation may be an impractical approach. The function evaluations may be costly, or there may be few instances to optimise and the knowledge gathered during the diagnostic walk may be useful for few actual optimisation processes. Therefore, a feature detection algorithm should ideally undertake not only a landscape analysis but also an search for the (approximate) optimum. The Predictive Diagnostic Optimisation (PDO) algorithm combines the two tasks of gathering information about the search space structure and finding approximate optimal solutions. The current and initial PDO implementation uses steepest descent (SD) as its deterministic optimisation method, an expensive approach which exhaustively explores the complete neighbourhood of a solution before making the move that leads to the best fitness improvement. It is also intuitively clear that SD may not find the globally optimal solution. Among all possible starting points for SD it is meaningful to choose the solution with the best prospects of leading to a global optimum. Hence the core technique of PDO is the prediction of the ultimate solution quality to be expected after executing all SD moves until no further improvement is found. The prediction is based on the first step of the SD from an
initial random solution. The ratio of the improvement achieved by the first step and the fitness improvement after the full descent forms a predictor. A predictor can then be matched to new initial solutions and their first SD moves. When, after the local optimisation, it is observed that none of the existing predictors was able to predict the ultimate locally optimal fitness, a new predictor is created based on this solution. As predictors are created dynamically whenever the existing predictors are unable to predict the quality of the optimum to a predefined margin of accuracy, the number of predictors created during the run can be used as an indicator of the homogeneity of the basins of attraction (assuming minimisation). If the slopes to these basins, or troughs with a local minimum at the bottom, all have slopes with the same shape, a single predictor will suffice to predict the outcome of a full steepest descent. In addition to the number of predictors created, the distribution of usage of these predictors elucidates the relative frequency of a certain shape of the path to a local optimum. However, as the search is deliberately biased toward optimising more promising solutions, the metrics of predictor count and predictor usage are illustrative of an above-average part of the fitness landscape. 1) Algorithm: Listing 1 outlines the PDO algorithm. Initially, the algorithm creates a solution x0 uniformly randomly. This solution is subsequently optimised using SD. After the first step of SD, the ratio of improvement between the initial solution x0 and the improved solution x1 is calculated as stated in eq. 6, which assumes minimisation. Note that the ratio is 0 ) corrected for the cardinality of the problem by dividing f (x n . f (x0 ) − f (x1 ) (6) f (x0 )/n The fitness at the end of the SD sequence is used to calculate the ratio between the initial improvement of the first step and the fitness of the local optimum f (xk ), as shown in eq. 7. Both ratios are used to create the first predictor. The first ratio is used to match the predictor to an initial solution before predicting its fitness after local optimisation. Additional predictors are created during the optimisation process whenever the existing predictors are found inaccurate beyond a tolerance . δ=
f (x0 ) − f (xk ) (7) f (x0 ) − f (x1 ) At each iteration within the allowed number of function evaluations F E, a predefined number c of candidate solutions are created randomly. One SD step is applied to each solution. The ratio δ is calculated on the basis of the fitness difference for each pair of inital solution x0 and its improved mutation x1 according to eq. 6. A comparison between the ratios δ of the candidate solutions and the available predictors identifies the closest predictor. For each candidate solution xi , the closest predictor predicts the fitness f (xki ) to be expected at the end of a sequence of steepest descent. SD is then applied to the candidate solution xi with the best fitness prediction. At the end of the steepest descent, the γ=
actual fitness f (xki ) is compared to the predicted fitness. If the combined differences between the actual and predicted fitnesses after the first step and after the full steepest descent exceed the tolerance level , the predictor is considered not accurate enough for the local optimum in question. Before adding a new predictor, however, the combined differences between all other predictors and the solution sequence at hand are calculated. If a sufficiently close predictor exists, no new predictor is created. The steps of the algorithm are listed as algorithm 1. Algorithm 1 Predictive Diagnostic Optimisation 1: procedure P REDICTIVE D IAGNOSTIC O PTIMISATION(F E, c, ) 2: popi ← randomSolution 3: S TEEPEST D ESCENT(popi ) 4: C REATE P REDICTOR(in:δ out:pred1 ) 5: while i < F E do 6: for i ← 1, c do 7: xi ← randomSolution 8: F IRST S TEP S TEEPEST D ESCENT(in:xi , out:δ) 9: for j ← 1, predSize do 10: if |δ(predj ) − δ(xi )| < δmin then 11: δmin ← |δ(predj ) − δ(xi )| 12: predclosest ← predj 13: end if 14: end for 15: P REDICT F ITNESS(in:xi , predclosest out:f itness) 16: if f itness < bestP redicted then 17: xmaxP red ← xi 18: end if 19: end for 20: S TEEPEST D ESCENT(xmaxP red ) 21: C ALC C OMB E RROR(in:xmaxP red , predclosest , out:αclosest ) 22: if αclosest > then 23: createN ew ← true 24: for j ← 1, predSize do 25: C ALC C OMB E RROR(in:xmaxP red , predj , out:αj ) 26: if αj < then 27: createN ew ← f alse 28: end if 29: end for 30: end if 31: if createN ew then 32: C REATE P REDICTOR(out:predn ) 33: end if 34: end while 35: end procedure
IV. PDO A S AN O PTIMISER A. Benchmark One of the more recent applications of stochastic algorithms to the QAP is Merz and Freisleben’s [20] Memetic Algorithm (MA), which integrates a local search procedure into a Genetic Algorithm (GA). This work discusses the inclusion of different recombination and mutation operators into the GA in combination with the local search. Having compared the alternative operators, the authors concluded in favour of an implementation which combines the CX crossover operator, distance-preserving mutation and a stochastic local search procedure. The algorithm initialises a relatively small population randomly and optimises the solutions using stochastic random search before applying CX crossover, followed by a mutation
operator which repeats transpositions (swaps) until a predefined distance from the parents is reached. Mating selection is made uniformly randomly, whereas the next generation is created by selecting the best individuals from both parent and child populations. Regular restarts are made to counteract early conversion which is facilitated by the small population of 10 − 40 individuals. In spite of Merz and Freisleben [20] finding that a different configuration performs better on structured problems, we use this formulation of the algorithm consistently as a benchmark. A complete listing is given in the same publication [20]. B. Result Quality The means and standard deviations of 50 trials of both algorithms solving the selected QAP instances are given in table II. Both algorithms were allowed an equal number of function evaluations. The results have been normalised, hence larger values are superior. It is clear that this initial PDO version, though not devised with a view to superseding other stochastic approaches, outperforms the specialised GA on all instances. PDO can also be regarded as more reliable, since the result quality varies consistently less than that of the results produced by the GA. TABLE II: Means, medians and standard deviations of fitnesses produced by PDO and MA Problem Chr20a Chr25a Kra30a Kra32 Nug20 Nug30 Ste36a Ste36b Tai20a Tai30b Tai100b
Mean 0.95 0.88 0.99 0.99 1.00 0.99 0.98 0.97 0.99 1.00 0.98
PDO Median 0.94 0.87 0.99 0.99 1.00 0.99 0.98 0.97 1.00 1.00 0.98
Std.Dev. 0.02 0.03 0.01 0.01 0.00 0.00 0.01 0.01 0.00 0.00 0.01
Mean 0.85 0.78 0.95 0.95 0.98 0.97 0.94 0.88 0.97 0.97 0.96
MA Median 0.85 0.78 0.95 0.95 0.98 0.97 0.95 0.88 0.97 0.98 0.96
Std.Dev. 0.06 0.05 0.01 0.01 0.01 0.01 0.02 0.05 0.01 0.03 0.01
As the results for the 50 trial runs are not normally distributed, we used the nonparametric Mann-Whitney U test to show that the quality of the results produced by PDO is significantly different from that produced by MA. We found a significance level of 99% (0.000) for each problem. V. PDO D IAGNOSTICS Each of the QAP problem instances was optimised 50 times using the PDO algorithm. The stopping criterion was set to 10 000 000 function evaluations and the error margin was set to 5%, i.e. a new predictor was created each time no existing predictor was able to match the first step quality and the quality of the local optimum combined with an accuracy of 5%. The number of predictors, averaged over 50 trials, is shown for each instance in fig. 2. A comparison of problem instances with equal cardinality n (Ste36a and Ste36b) shows that they can have very different numbers of predictors and it would seem that the predictor usage of the optimiser accurately reflects a ‘ruggedness’ of the search space without being
Number of Predictors
influenced by the - well known and therefore less interesting - complexity due to cardinality. It is important to note that the algorithm implements a deliberate bias toward exploration of the more promising basins. Only the best of c = 5 random candidate solutions is optimised to the local optimum, thus the predictors collectively describe approximately the best 20th of the search space. It is clear that we cannot expect a direct correlation between the local optima in fig. 1 and the number of predictors found. The number of predictors is largely influenced by the diversity of the slopes of the basins. Nug20 and Nug30 are known to be variations of the same problem, hence the almost equal number of predictors corroborate the assumption of homogeneous basin shapes.
chr20a
chr25a
kra30a
kra32
nug20
nug30
ste36a
ste36b
tai100b
tai20a
tai30b
Problem
Fig. 2: Number of predictors produced by PDO, averaged over 50 runs. Even though the number of predictors may be large regardless of equal fitnesses of local optima, the inverse is unlikely to be true. The same predictor is very unlikely to predict a shallow and a deep basin. Both Ste36a and Ste36b have optima with very diverse fitnesses and they also produce the most predictors by a large margin. The spread of fitness in local optima is twice as large in Ste36b than Ste36a, and Ste36b produces almost twice as many predictors. Similarly, Kra30a, Kra32, Nug20 and Nug30 have very small differences in local optima and produce a very small number of predictors. These instances have also been documented to describe problems with little variation. The predictor numbers for Tai20a and Tai30b reflect the existing knowledge of Tai**a being ‘well behaved’, Tai**b problems were randomly created with large variations in fitnesses of optima. Chr20a and Chr25a produce the largest number of predictors of all explored instances. This coincides with the large spread of optima visible in fig. 1 and the assessment of Christofides and Benavent [5] who regard these problems as difficult. The boxplots of the usages of individual predictors (fig. 3) show the distribution of the usages of the predictors. The
columns on the left represent the usage of each predictor as a percentage of the overall usages, the columns on the right show the fitnesses f (xk ) of the SD sequence used to create the respective predictor. Again, showing the fitnesses as percentages of the (known) optimum leads to larger values being superior. As predictors are only used on the most promising of c candidate solutions, predictors with small predicted fitnesses are unlikely to be created. Therefore, the differences between the fitnesses of the predictors remain small. Nonetheless, each problem instance has one predictor which is used significantly more often than all the others. Tai20a is not among the illustration since it exhibits only a single predictor. This is consistent with the assumption that the Tai**a problems are very homogeneous. Tai30b (fig. 3f, used 50% of the time) predicts only a quality of 78% of the possible optimum. We can conclude that this fitness value is prevalent in the upper 20% of the fitnesses of local optima. Chr20a and Chr25a both produce many predictors, one of which is predominantly used. The remaining predictors are more evenly applied in Chr20a than Chr25a, indicating a more even basin shape in the most promising 20% of the search space of Chr20a. From the usage distributions of the predictors depicted in fig. 3 we can approximate the proportion of basins similar to the global optimum. Even if the global optimum is unknown, the decision to apply local search only to solutions which are likely to produce fitnesses that exceed the fitness of these prevalent optima can save function evaluations. From fig. 3 it becomes clear that for an algorithm based on projection like PDO, irregular or ‘rugged’ landscapes (such as Ste36a (fig. 3c) with diverse basin shapes are more instructive than regular landscapes such as Kra32 (fig. 3e). With an abundance of predictors, a greedy algorithm can choose to ignore all but the candidate solutions which have the highest predicted fitness. A. Comparison of Search Space Metrics The existing metrics of autocorrelation, FDC and elementary landscapes describe landscapes from viewpoints that only partially correlate to the basin homogeneity described by the PDO predictors. The elementary landscape analysis concludes that all QAP instances used in this study have elementary fitness landscapes when traversed using the transposition move which is employed by the steepest descent PDO uses. As all problem instances studied here have an elementary landscape, we have no comparison to assess the usefulness of ELA. One of the drawbacks of ELA is that the mean fitness has to be known to exploit the properties of the landscape. This necessitates some preliminary probing into the search space to acquire even a sample mean. High correlation lengths are also believed to be indicative of suitability for local search. The correlation length of Kra32 is four times the equivalent value of Kra30a, yet the result quality produced by both algorithms is equal for both instances (table II). However, Kra30a has far too few predictors to warrant
80
70
70
60
60 Usage % / Fitness Ratio
Usage % / Fitness Ratio
80
50
40
30
50
40
30
20
20
10
10
0
1
2
3
4
5
6 7 8 9 Predictor Number
10
11
12
13
14
0
15
1
2
3
4
5
6
(a) Chr20a 100
90
90
70 60 50 40 30
13
14
15
16
70 60 50 40 30
20
20
10
10
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839 Predictor Number
1
2 Predictor Number
(c) Ste36a 100
100
90
90
80
80
70
70
60 50 40 30
50 40 30 20
10
10 2
3 4 Predictor Number
(e) Kra32
4
60
20
1
3
(d) Tai100b
Usage % / Fitness Ratio
Usage % / Fitness Ratio
12
80 Usage % / Fitness Ratio
Usage % / Fitness Ratio
80
0
11
(b) Chr25a
100
0
7 8 9 10 Predictor Number
5
6
0
1
2
3
4 5 Predictor Number
6
7
8
9
(f) Tai30b
Fig. 3: Predictor usages shown for each individual predictor produced by a single trial of the algorithm. The columns to the left show the usage of each predictor as a percentage of all usages. The predicted fitnesses of each predictor are shown as percentages of the optimum (superior fitnesses are larger).
displaying, hence, using PDO, we can accurately predict a higher basin homogeneity for Kra30a than Kra32(fig. 3e). FDC is descriptive of whether the local optima ‘slope towards’ the global optimum (values close to +1) or away from it (values close to -1). This value is most helpful for predicting the success of algorithms that employ ‘global moves’, i.e. change a large part of the solution at once while still preserving some of the previous solution, such as recombination. For the assessment of the potential of PDO it is less useful, as there is no requirement for good-quality local optima to reside in the vicinity of the globally best solution. Also, FDC require a known global optimum, which makes it a less useful candidate for assessing the potential of search algorithms to use on a problem. VI. C ONCLUSION This work introduces PDO, a very simple optimisation approach which is based on predictive local search. It is one of few approaches that could be described as Diagnostic Optimisation, as it provides information about the search landscape topology, in particular the distribution of ‘basin’ shapes at the same time as delivering approximate solutions to the problem. As a stochastic optimiser, it seems to perform at least as well as a state-of-the art adaptation to the problem type used as a benchmark, although more experimentation will be needed on more complex and more problem types. The information returned by the algorithm describes the homogeneity of the basins in the landscape accurately as far as its accuracy can be assessed based on existing knowledge about the problem instances. The information obtained as a byproduct of the search is also meaningful in that the descriptions of the different instances vary significantly to distinguish the landscapes. The descriptive properties of the predictors is enhanced by a knowledge of the fitness distributions of local optima, which is a metric that can easily be recorded during the algorithm’s progress. A very interesting topic for future research will be the question how this description of the landscape can be used to predict the performance of wellknown stochastic optimisers. R EFERENCES [1] L. Altenberg. Fitness distance correlation analysis: An instructive counterexample. In In Proceedings of the Seventh International Conference on Genetic Algorithms, pages 57–64. Morgan Kaufmann, 1997. [2] E. Angel and V. Zissimopoulos. Autocorrelation coefficient for the graph bipartitioning problem. Theoretical Computer Science, 191:229–243, 1998. [3] J. Barnes and B. Colletti. Local search structure in the symmetric travelling salesperson problem under a general class of rearrangement neighborhoods. Applied Mathematics Letters, 14(1):105 – 108, 2001. [4] J. W. Barnes, B. Dimova, S. P. Dokov, and A. Solomon. The theory of elementary landscapes. Applied Mathematical Letters, 16:337–343, 2002. [5] N. Christofides and E. Benavent. An exact algorithm for the quadratic assignment problem on a tree. Operations Research, 37(5):760–768, 1989. [6] B. Codenotti and L. Margara. Local properties of some NP-complete problems. Technical report (International Computer Science Institute). International Computer Science Institute, 1992. [7] K. Deb and H.-G. Beyer. Self-adaptive genetic algorithms with simulated binary crossover. Complex Systems, 9:431–454, 1999.
[8] L. Gambardella and M. Dorigo. Ant-q: A reinforcement learning approach to the traveling salesman problem. In Proceedings of ML-95, Twelfth International Conference on Machine Learning, pages 252–260. Morgan Kaufmann, 1995. [9] D. E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, 1989. [10] L. K. Grover. Local search and the local structure of np-complete problems. Operations Research Letters, 12(4):235 – 243, 1992. [11] M. Hauschild and M. Pelikan. Advanced neighborhoods and problem difficulty measures. In Proceedings of the 13th annual conference on Genetic and evolutionary computation, GECCO ’11, pages 625–632, 2011. [12] M. Hauschild and M. Pelikan. An introduction and survey of estimation of distribution algorithms. Swarm and Evolutionary Computation, 1(3):111 – 128, 2011. [13] M. W. Hauschild and M. Pelikan. Intelligent bias of network structures in the hierarchical boa. In Proceedings of the 11th Annual conference on Genetic and evolutionary computation, GECCO ’09, pages 413–420, New York, NY, USA, 2009. ACM. [14] M. W. Hauschild, M. Pelikan, K. Sastry, and D. E. Goldberg. Using previous models to bias structural learning in the hierarchical boa. In Proceedings of the 10th annual conference on Genetic and evolutionary computation, GECCO ’08, pages 415–422, New York, NY, USA, 2008. ACM. [15] T. Jones and S. Forrest. Fitness distance correlation as a measure of problem difficulty for genetic algorithms. In Proceedings of the Sixth International Conference on Genetic Algorithms, pages 184–192. Morgan Kaufmann, 1995. [16] T. Jones and S. Forrest. Genetic algorithms and heuristic search. Working papers, Santa Fe Institute, 1995. [17] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671–680, 1983. [18] J. Krarup and P. M. Pruzan. Computer-aided layout design. In Mathematical Programming in Use, volume 9 of Mathematical Programming Studies, pages 75–94. Springer Berlin Heidelberg, 1978. [19] D. S. Lee, V. S. Vassiliadis, and J. M. Park. A novel threshold accepting meta-heuristic for the job-shop scheduling problem. Computers & Operations Research, 31(13):2199 – 2213, 2004. [20] P. Merz and B. Freisleben. Fitness landscapes, memetic algorithms, and greedy operators for graph bipartitioning. Evolutionary Computation, 8:61–91, 2000. [21] M. Mitchell, J. H. Holland, and S. Forrest. When will a genetic algorithm outperform hill climbing? In Advances in Neural Information Processing Systems 6, pages 51–58. Morgan Kaufmann, 1993. [22] C. E. Nugent, T. E. Vollman, and J. Ruml. An experimental comparison of techniques for the assignment of facilities to locations. Operations Research, 16:150–173, 1968. [23] M. Randall. Near parameter free ant colony optimisation. In M. Dorigo, M. Birattari, C. Blum, L. Gambardella, F. Mondada, and T. StÃijtzle, editors, Ant Colony Optimization and Swarm Intelligence, volume 3172 of Lecture Notes in Computer Science, pages 262–285. Springer Berlin / Heidelberg, 2004. [24] T. Schiavinotto and T. Stützle. A review of metrics on permutations for search landscape analysis. Comput. Oper. Res., 34:3143–3153, October 2007. [25] C. Skinner and P. J. Riddle. Random search can outperform mutation. In IEEE Congress on Evolutionary Computation’07, pages 2584–2590, 2007. [26] P. F. Stadler. Landscapes and their correlation functions, 1996. [27] L. Steinberg. The backboard wiring problem: a placement algorithm. SIAM Review, 3:37–50, 1961. [28] E. Weinberger. Correlated and uncorrelated fitness landscapes and how to tell the difference. Biological Cybernetics, 63:325–336, 1990. [29] D. Whitley, A. M. Sutton, and A. E. Howe. Understanding elementary landscapes. In Proceedings of the 10th annual conference on Genetic and evolutionary computation, GECCO ’08, pages 585–592, New York, NY, USA, 2008. ACM. [30] L. D. Whitley and A. M. Sutton. Partial neighborhoods of elementary landscapes. In Proceedings of the 11th Annual conference on Genetic and evolutionary computation, GECCO ’09, pages 381–388, New York, NY, USA, 2009. ACM. [31] S. Wright. The roles of mutation, inbreeding, crossbreeding and selection in evolution. Proceedings of the Sixth International Congress of Genetics, 1:356–66, 1932.