Preprint 0 (1997) ?{?
1
Landscapes, Operators and Heuristic Search
Colin R Reeves School of Mathematical and Information Sciences Coventry University UK E-mail:
[email protected]
Heuristic search methods have been increasingly applied to combinatorial optimization problems. While a speci c problem de nes a unique search space, dierent `landscapes' are created by the dierent heuristic search operators used to search it. In this paper, a simple example will be used to illustrate the fact that the landscape structure changes with the operator; indeed, it often depends even on the way the operators are applied. Recent attention has focused on trying to understand better the nature of these `landscapes'. Recent work by Boese et al. [2] has shown that instances of the TSP are often characterised by a `big valley' structure in the case of a 2-opt exchange operator, and a particular distance metric. In this paper their work is developed by investigating the question of how landscapes change under dierent search operators in the case of the n=m=P=Cmax
owshop problem. Six operators and 4 distance metrics are de ned and the resulting landscapes examined. The work is further extended by proposing a statistical randomisation test to provide a numerical assessment of the landscape. Other conclusions relate to the existence of ultrametricity, and to the usefulness or otherwise of hybrid neighbourhood operators. Keywords: Heuristics, owshop sequencing
1. Introduction The metaphor of a landscape is commonly used to aid the understanding of heuristic search methods for solving a combinatorial optimization problem (COP). We can de ne such problems as follows: we have a discrete search space X , and a function
f : X 7! IR:
The general problem is to nd
x = arg min f: x2X
where x is a vector of decision variables and f is the objective function. The vector x is a global optimum; along with the idea of a landscape is the idea that there are many local optima or false peaks in the objective function, which may have the unfortunate eect of trapping a search algorithm before it can locate the global optimum. The landscape metaphor is a helpful one in some senses, but it can also be dangerous. It is tempting to identify the landscape with the search space X : to treat them as if the labels `landscape' and `search space' apply to the same objects. However, the landscape concept only really makes sense in the context of an associated neighbourhood structure, without which the related ideas of local optima have no meaning. A formal treatment of neighbourhood structures is given elsewhere (Hohn and Reeves [4,5]); here we motivate the ideas by an intuitive approach based on a simple example. 1.1. An example
In practice, a neighbourhood structure is generated by the application of an operator which transforms a given vector x into a new vector x0 . For example, if the solution is represented by a binary vector (as is often the case for genetic algorithms, for instance), a simple neighbourhood might consist of all vectors obtainable by complementing one of the bits. Consider the problem of maximizing a simple function
f (x) = x3 ? 60x2 + 900x + 100 where the solution x is represented by a vector of length 5, of 0s and 1s. By decoding this binary vector as an integer x in the range [0; 31] it is possible to evaluate f . In terms of x there is a single maximum at x = 10, and the `landscape' is a smooth continuous unimodal function in that range; however, the discrete optimization problem obtained by the binary coding turns out to have 4 optima when a `single bit complement' operator (SBC) is used|i.e., a new vector x0 is obtained from x by complementing a single bit. The neighbours of (0 0 0 0 0), for example, would be (1 0 0 0 0), (0 1 0 0 0), (0 0 1 0 0), (0 0 0 1 0) and (0 0 0 0 1). If a `steepest ascent' strategy is used (i.e., from a given vector the best
Table 1 Local optima and basins of attraction for steepest ascent Local optimum 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 0 0 0 (4100) (3988) (3803) (3236) Basin
00000 00100 00001 01100 00010 11100 00011 00101 01000 01001 01010 01011 01101 01110 01111 10101 11000 11001 11010 11011 11101 11110 11111
00110 00111 10110 10111
10000 10001 10010 10011 10100
neighbour is identi ed before a move is made) the local optima and their basins of attraction are as shown in Table 1. On the other hand, if a `next ascent' strategy is used (where the next change which leads uphill is accepted without ascertaining if a better one exists), the basins of attraction are as shown in Table 2. In the case of next ascent, the order of searching the vector also aects the landscape. In Table 2 the order is `forward' (left-to-right), but if the search is made in the reverse direction (right-to-left) the basins of attraction are dierent, as shown in Table 3. However, the single-bit complement operator is not the only mechanism for generating neighbours. An alternative neighbourhood could be de ned as follows: for k = 1; : : : ; 5 complement only bits k; : : : ; 5. Thus, the neighbours of (0 0 0 0 0), for example, would be (1 1 1 1 1), (0 1 1 1 1), (0 0 1 1 1), (0 0 0 1 1) and (0 0 0 0 1). This creates a very dierent landscape. In fact, there is now only a single
Table 2 Local optima and basins of attraction for next ascent (forward search) using SBC operator Local optimum 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 0 0 0 (4100) (3988) (3803) (3236) Basin
00101 00110 01001 01010 01011 01101 01110 10101 10110 11001 11010 11011 11101 11110
00100 01000 01100 10100 11000 11100
00111 01111 10111 11111
00000 00001 00010 00011 10000 10001 10010 10011
Table 3 Local optima and basins of attraction for next ascent (reverse search) using SBC operator Local optimum 0 1 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 0 0 0 (4100) (3988) (3803) (3236) Basin
01000 01001 01010 01011
01100 01101 01110 01111
00000 00001 00010 00011 00100 00101 00110 00111
10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111
global optimum (0 1 0 1 0); every vector is in its basin of attraction. There are two interesting facts about this operator. Firstly, it is in fact closely related to the one-point crossover operator frequently used in genetic algorithms. (For that reason, it has been called [4] the complementary crossover or CX operator). Secondly, if the 32 vectors in the search space are re-coded using a Gray code, it is easy to show that the neighbours of a point in Graycoded space under SBC are identical to those in the original binary-coded space under CX. This is an example of an isomorphism of landscapes. More details of the mathematical background to these and similar phenomena can be found in [4,5]. 1.2. Landscape topology
The above example is not of suciently general interest to be pursued here, but it does illustrate very neatly some of the problems associated with trying to understand the nature of a landscape in a COP. Unfortunately, in many instances of a COP, it is not easy to construct the landscape in terms of its underlying representation as a metric space. Rather, a landscape is induced by the particular operator which is used to de ne the neighbourhoods, from which it may not be easy explicitly to derive the representation. Thus, it is of interest to consider alternative ways of characterizing the nature of the induced landscape. More generally, the question of what a landscape `looks like' is of some relevance to the way in which it should be searched by a heuristic technique. Recently, Boese et al. [2] have suggested that, in the cases of the travelling salesman problem (TSP) and graph bisection, local optima tend to be relatively close to each other (in terms of a plausible metric), and to the global optimum (where it was known). Thus, for at least some problem landscapes (and possibly many), there is a `big valley' structure, where local optima occur in clusters. If this is indeed the case, it would support the idea of generating new start points for search from a previous local optimum rather than from a random point in the search space|good candidate solutions are usually to be found `fairly close' to other good solutions. This in turn provides the motivation for some of the `perturbation' methods [6,11,15] which currently appear to be among the best available for such COPs as the TSP. It should also be noted that other methods, such as genetic algorithms, and some versions of tabu search (for example, those that use `path relinking' [3]) also implicitly rely on the existence of such a structure.
However, it is not known whether this phenomenon is general, or whether it is speci c to the cases previously examined. In extending it to other COPs, some interesting questions arise as to the way in which `closeness' can be measured, and as to how the signi cance of the observed behaviour can be assessed. In this paper, the well-known n=m=P=Cmax owshop sequencing problem is studied in an attempt to shed further light on this question.
2. Flowshop sequencing The permutation owshop sequencing problem n=m=P=Cmax , is one in which n jobs have to be processed (in the same order) on m machines. The object is to nd the permutation of jobs that will minimise the makespan, i.e. the time at which the last job is completed on machine m. It is de ned as follows: Suppose the processing time is p(i; j ) for job i on machine j , and the job permutation is denoted by = f1 ; 2 ; ; n g. Then the job completion times C (i ; j ) are as follows:
C (1 ; 1) = p(1 ; 1) C (i ; 1) = C (i?1 ; 1) + p(i ; 1) for i = 2; : : : ; n C (1 ; j ) = C (1 ; j ? 1) + p(1 ; j ) for j = 2; : : : ; m C (i; j ) = maxfC (i?1 ; j ); C (i ; j ? 1)g + p(i ; j ) for i = 2; : : : ; n; j = 2; : : : ; m Cmax = C (n; m) In order to nd good solutions to instances of this problem, there are many possible NS operators, each generating a dierent landscape, and each needing a suitable metric for measuring the distance between solutions. In principle, the distance between two solutions and 0 on a landscape should be measured by the minimal number of applications of the operator which would convert into 0 . However, in general no polynomial algorithm is known for calculating this number. There are a number of surrogate distance metrics that could be used in such cases. In this paper, 4 metrics have been used: The adjacency metric is found simply by counting the number of times nadj a pair i; j of jobs is adjacent in both and 0 . The distance is then measured
by n ? nadj ? 1. This is the uni-directional version; we may relax the adjacency requirement to cover the case of either i; j or j; i being present, thus obtaining a similar bi-directional adjacency metric. The precedence metric tries to be a little more sophisticated. Rather than simply look at adjacencies, the number of times job j is preceded by job i in both and 0 is counted. To get a `distance', this quantity is subtracted from n(n ? 1)=2. The position-based metric takes the principle one stage further, by comparing the actual positions in the sequence of job j in each of and 0 . For a sequence we can de ne the `inverse' permutation where the position of job i is given by i = i. The position-based metric is then just
Xn jj ? j0 j:
j =1
We thus have 4 metrics to use as surrogates for the `true' metric which of course depends on the particular operator being used. Boese et al. [2] used an adjacency-based metric, which seems reasonable in the context of the TSP, where it is relative order that counts, rather than absolute position in the sequence. Which is the `best' approximation to the true one in the general case is dicult to judge, but we might expect that for the owshop problem the order in which they have been listed above is in increasing order of their eectiveness at discriminating between dierent local optima. While adjacency is an important property in solving the TSP, as Boese et al. [2] have argued, it is unlikely to be so relevant in this case, where the actual position of a job in the sequence is much more likely to be important. 2.1. NS operators for sequencing
Several operators have been proposed in previous work on permutation or sequencing problems. Here we investigate 6 ways in which one solution can be transformed into another. All of these, with the possible exception of invesrion, have frequently been used in applications to owshop sequencing. Adjacent pairwise exchange (APEX) is perhaps the simplest: the positions of two adjacent jobs fi ; i+1 g are exchanged. Inversion (INV) is the selection of a sub-vector of , say r ; : : : ; s , and reversing its order.
Exchange (EX) is an obvious generalization of APEX: all pairs of jobs are eligible for exchange of positions, not just adjacent ones. Forward shift (FSH) takes a job i from its current position and inserts it after job j (where j > i). Backward shift (BSH) is like FSH, but with the sense reversed: job i is removed from its current position and inserted before job j (where j < i). Double shift (DSH) is the combination of FSH and BSH. In the case of APEX, each sequence has (n ? 1) neighbours; for DSH each sequence has n(n ? 1) neighbours. In the other cases, there are n(n ? 1)=2 neighbours of a given sequence. We will refer to this as the size of the operator. There are some interesting relationships among these operators. It is clear that APEX is the weakest, in that any sequence which is optimal with respect to any of the other operators is also optimal with respect to APEX. In that sense, we could say that APEX is subsumed by the others. Similarly, DSH subsumes FSH and BSH. However, neither INV or EX is subsumed by any other operator, although it is probable that on the average DSH would produce better solutions, given that its neighbourhood is twice as large. Apart from these obvious facts, any investigation of the relative ecacy of these operators in a particular case will necessarily be empirical. The type of questions that are interesting are: whether one operator of the same size tends to outperform another; how often a local optimum with respect to one operator is also locally optimal with respect to another; and what sort of `landscape' is induced by each operator.
3. Landscape Analysis In order to try to understand better what the landscape of such problems looks like, each operator was applied 50 times from dierent random initial vectors to a number of problem instances of various sizes. This resulted in 50 distinct local optima in each case|the fact that no more than 50 initial start points was needed is a point which we shall return to at a later stage. 3.1. Distance correlations
Apart from measuring distances of local optima from a given point in terms of the 4 metrics introduced above, it is also possible to compare the local optima
Figure 1. 50 local optima plotted in terms of their distances from a global optimum (x-axis), against their relative objective function values (y-axis). The operator used was forward shift, the strategy was next ascent, each repeated from 50 dierent initial random start points, which led to 50 distinct local optima. In this case the metric used is position-based.
250 200
150 OBJFN 100 50 0 40
60
80
100 POSN
120
140
160
in terms of their `distances' as measured by their objective function values. If the results of Boese et al. [2] carry over to the owshop problem, we would expect to nd that these distances are in some sense correlated with each other. This can be examined in any particular case by plotting a graph of local optima relative to the global optimum in terms of one of the distance metrics against distance in terms of objective function values. Figure 3.1 provides an example of such a graph for 50 local optima generated by repeated restarts of a next descent procedure using the rst of the 20-job, 5-machine set of problem instances generated by Taillard [13]. These problem instances were chosen for investigation partly because they have become well-known cases of owshop sequencing, but also because the global optima are known1 for most if not all of the smaller instances, thus providing a xed reference point in each case. The rst graph appears to con rm that a relationship such as that found by Boese et al. [2] for the TSP, also exists in this case of the owshop sequencing problem|local optima seem to be `close' to each other, which would motivate the 1
The author is indebted to Vaessens [personal communication] for making available his results on global optima for these instances.
Figure 2. A second example, where the metric used is uni-directional adjacency-based.
250
200 150 OBJFN 100 50
0 14
15
16
ADJ1
17
18
19
development of adaptive techniques similar to those proposed in [2]. In contrast, such a relationship looks far less likely for the second case, where the metric used is much less eective at discriminating between local optima. However, it is clearly impossible to plot such a graph for every operator, metric and problem instance. What is needed in order to assess any particular case is a simple measure of the relationship between the local optima. We would like to know whether such a relationship exists or not, and if so, how signi cant it is. At rst sight, it might appear that a simple measure would be the correlation coecient computed from the corresponding entries in the two distance matrices found from the distance metric and the objective function values respectively. Such a value can of course easily be calculated, but interpreting its signi cance would be dicult, as the sample clearly does not consist of independent random samples (for example, if local optima A and B are close in terms of their objective function values, and B is also close to C, then so are A and C). In these circumstances, we cannot carry out a standard hypothesis test of the correlation coecient. Fortunately, an alternative strategy is available. This type of problem has been studied in other branches of science, for example in psychology and biology [10], where it has been approached by using a randomization test [9]. This is carried out by repeatedly permuting the labels of the
items in one distance matrix, and re-calculating the correlation coecient. If this procedure is repeated many (e.g. 1000) times, the fraction of replications in which a correlation coecient is found that is more extreme than the originally calculated value can be used as an estimated signi cance level, and the relevance of the value found can be assessed. For example, in the two cases described in the above scatter plots, the calculated correlation coecients were 0.545 and 0.120. While the second of these does not look very impressive, in fact both are statistically signi cant at the 0.1% level, on the basis of 1000 replications in a randomization test. A randomization test was carried out for each operator, for each metric and for all the 20/5, 20/10, 20/20, 50/5 and 50/10 sets of problem instances de ned by Taillard [13]. For all of these cases the global optimum is now known. The results are displayed in Table 3.1. It is clear from this table that for the precedence- and position-based metrics, there is nearly always a strong relationship between the distances of the solutions from each other and their corresponding objective function values. This is less clear in the case of the adjacency-based metrics. These results are in line with what was expected, as discussed above. There also appear to be some interesting interactions between metric and operator: the local optima on the APEX landscape, in the case of the larger problems, seem to be much less well correlated with the inter-optima distances for the more sophisticated metrics. Nevertheless, overall there appear to be signi cant correlations between local optima generated by these operators, whatever metric is used to provide the meaning of `distance'. Thus, the argument put forward by Boese et al. [2] for the existence of `big valleys' in the TSP seems also to be valid in the context of
owshop sequencing. 3.2. Ultrametricity
Another issue is the existence or otherwise of an ultrametric relationship between local optima. Mathematically, an ultrametric is a distance measure d(; ) that satis es the usual requirements for a metric, except that the triangle inequality
d(x; z) d(x; y) + d(y; z)
Table 4 Signi cance of correlations: each cell records the number of P-values (out of 10) that were signi cant at the 1% level on a randomization test Metric Operator Adj-1 Adj-2 Prec Posn 20/5 problem instances APEX 1 10 EX 0 0 INV 0 8 FSH 0 8 BSH 0 6 DSH 0 0
10 8 9 9 7 2
10 9 10 10 8 7
APEX 2 10 EX 0 0 INV 0 10 FSH 0 9 BSH 0 9 DSH 0 0 20/20 problem instances
10 8 10 10 10 2
10 10 10 10 10 9
APEX EX INV FSH BSH DSH
10 0 10 10 10 0
10 10 10 10 10 9
10 10 10 10 10 10
50/5 problem instances APEX 10 10 EX 5 4 INV 8 6 FSH 5 3 BSH 7 2 DSH 10 10
2 10 10 10 10 10
2 8 8 10 8 9
2 10 9 10 10 10
1 10 6 10 9 10
20/10 problem instances
5 0 0 0 0 0
50/10 problem instances APEX EX INV FSH BSH DSH
10 10 10 9 10 10
10 10 10 8 10 10
is replaced by the stronger condition
d(x; z) max(d(x; y); d(y; z)): (An ultrametric can thus be thought of as one where every `triangle' between 3 points is either equilateral or isosceles with an acute included angle.) This has been suggested by a number of authors [8,1] for some combinatorial optimization problems, although evidence for it seems inconclusive. As Baldi and Baum [1] points out, were such a relationship to exist for a given problem class, it would imply a hierarchical relationship between local optima|a structure which could be exploited in devising an algorithm for the solution of such problems. The number of sets of 3 local optima for which ultrametricity is the case can be computed fairly easily for a given instance. These calculations were carried out for all the 20/5, 20/10, 20/20, 50/5 and 50/10 sets of problem instances de ned by Taillard [13]. For all of these cases the global optimum is now known. The results are displayed in Table 5. Ultrametricity seems to be an unlikely phenomenon under the more sophisticated metrics|those which we expect to be more representative of the underlying landscape in the case of owshop sequencing. Even for the adjacency-based metrics and the APEX operator, there are never more than 50% of ultrametric triangles. The tentative evidence adduced in its favour in earlier work on the TSP may perhaps be attributed to the fact that the operator used was fairly weak (2-opt), while the metric used was adjacency-based. 3.3. Dependencies between operators
As has been observed in the introduction, a local optimum on one landscape (i.e. with respect to one operator) is not necessarily a local optimum on another landscape. In some cases, where a weaker operator is subsumed by a stronger, it is obvious that a local optimum with respect to the stronger cannot be improved by the operator which it subsumes. Thus the local optima of the DSH landscape are all local optima on the APEX, FSH and BSH landscapes. What is interesting from an empirical viewpoint is how often a local optimum on one landscape is capable of being improved on another. Thus, in the experiments reported above, each time a local optimum was reached on one landscape, it was used as a start point on one of the other landscapes in an attempt
Table 5 Ultrametricity: each cell records the average percentage of ultrametric `triangles' of local optima. Metric Operator Adj-1 Adj-2 Prec Posn 20/5 problem instances APEX 46.3 32.6 EX 38.9 29.3 INV 42.3 30.3 FSH 39.2 30.0 BSH 37.6 29.0 DSH 34.3 27.3 20/10 problem instances
2.89 3.71 3.27 3.39 3.68 4.12
4.51 5.27 4.64 5.05 5.39 5.81
APEX EX INV FSH BSH DSH
33.1 28.3 30.4 28.6 27.9 26.6
2.75 3.62 3.07 3.26 3.55 3.78
4.60 5.27 4.64 4.94 5.14 5.38
20/20 problem instances APEX 46.4 32.7 EX 35.8 28.9 INV 41.8 31.2 FSH 36.0 29.2 BSH 35.9 29.0 DSH 32.0 27.4
2.74 3.24 2.79 3.19 3.30 3.25
4.53 4.93 4.51 4.99 4.88 5.10
APEX 47.5 32.0 EX 42.3 29.3 INV 46.3 31.1 FSH 40.6 29.2 BSH 40.2 28.6 DSH 37.2 27.0 50/10 problem instances
0.73 0.85 0.77 0.80 0.90 0.94
1.12 1.23 1.26 1.20 1.29 1.31
APEX EX INV FSH BSH DSH
0.74 0.89 0.75 0.85 0.90 0.92
1.14 1.32 1.15 1.21 1.25 1.14
46.1 35.8 40.5 35.6 34.1 31.4
50/5 problem instances
46.6 37.6 43.3 37.3 36.1 34.3
31.5 27.6 30.1 27.5 27.0 25.8
to improve it. Statistics were computed for each of the 50 local optima generated, and the results are summarized in Tables 3.3 and 3.3. There are a few points of note in these tables: rstly, the surprizing fact that some (albeit very few) of the APEX-optima could not be improved by the other operators, although the average improvement was of the order of 10 ? 20%. Taken together, this suggests that some of the local optima in the other landscapes are not particularly high-quality. Although the same size as EX, FSH and BSH, INV appears to be inferior to them. It is also interesting that overall, EX improves FSH and BSH more than the converse. The dominance of the DSH operator is as expected, although even here, it is possible for EX, in particular, to nd an improved solution.
4. Multiple Global Optima A possible objection to the methods and results obtained above is that all distances are measured relative to a single global optimum. However, it would be reasonable to assume that instances of the owshop sequencing problem, might give rise to multiple global optima. In fact, in some cases it is quite easy to generate alternative globally optimal solutions, especially in the case of the makespan objective, by applying a neighbourhood operator to a known global optimum and looking for an improvement of zero in the objective function. However, such solutions are necessarily very close to the rst global optimum, and it seems safe to assume that for alternative global optima generated in this way the conclusions reached above are likely to hold. However, it is possible that some global optima might be relatively distant from each other. If this were so, the relative positions of the local optima generated by the above operators might also be aected. In an attempt to explore this question further, we adapted a program from Yamada's work on the jobshop problem [14] and used it to generate 2548 distinct global optima to the rst 20-job, 5-machine problem of Taillard's benchmark set [13]. There are almost certainly many more than this|these were generated in one overnight run on a Sun SparcStation. On examining these solutions in greater detail, it was clear that many of them were extremely close to each other, and to the original global optimum used in section 3, in terms of all 4 metrics. Thus the conclusions reached above regarding the 50 local optima were intact. Nevertheless, there were still some global optima that were further apart,
Table 6 Dependencies between local optima; each cell of the table records the average number of trials (out of 50) that method B was able to improve method A for Taillard's problem instances. Method B Method A APEX EX INV FSH BSH DSH 20/5 problem instances APEX 0 49.2 EX 0 0 INV 0 22.2 FSH 0 28.1 BSH 0 24.5 DSH 0 9
47.8 3.0 0 17.0 16.5 5.2
48.2 17.7 23.3 0 30.8 0
49.4 15.9 28.8 31.6 0 0
49.8 29.3 38.9 31.6 30.8 0
APEX 0 49.8 EX 0 0 INV 0 31.3 FSH 0 29.3 BSH 0 28.6 DSH 0 12.7 20/20 problem instances
47.7 5.7 0 13.6 17.2 6.7
49.9 27.2 33.9 0 38.6 0
49.7 31.5 40.8 37.7 0 0
50.0 40.9 45.1 37.7 38.6 0
APEX EX INV FSH BSH DSH
46.4 3.7 0 10.9 11.9 4.5
49.9 31.9 43.8 0 39.2 0
49.8 33.1 43.5 37.3 0 0
50.0 42.0 48.5 37.3 39.2 0
APEX 0 49.9 EX 0 0 INV 0 31.3 FSH 0 34.2 BSH 0 32.5 DSH 0 7 50/10 problem instances
49.7 4.2 0 19.6 22.1 2.7
49.9 17.1 29.2 0 38.8 0
50.0 21.8 32.1 37.5 0 0
50.0 33.5 42.3 37.5 38.8 0
APEX EX INV FSH BSH DSH
49.8 4.5 0 12.7 15.6 3.7
50 24.3 40.6 0 40.6 0
50 20.8 39.8 34.4 0 0
50 33.9 46.9 34.4 40.6 0
20/10 problem instances
0 0 0 0 0 0
50.0 0 38.8 26.6 24.5 10.2
50/5 problem instances
0 0 0 0 0 0
50 0 40.3 28.1 30.8 10.7
Table 7 Dependencies between local optima; each cell of the table records the average percentage improvement of method B over method A for Taillard's problem instances. Method B Method A APEX EX INV FSH BSH DSH 20/5 problem instances APEX 0 12.8 EX 0 0 INV 0 1.84 FSH 0 1.98 BSH 0 1.13 DSH 0 0.33
10.8 0.22 0 1.11 0.78 0.13
11.5 1.02 1.69 0 2.12 0
12.2 0.86 2.44 2.50 0 0
14.8 2.09 3.99 3.15 2.51 0
APEX 0 15.5 EX 0 0 INV 0 2.85 FSH 0 1.38 BSH 0 1.33 DSH 0 0.26 20/20 problem instances
11.9 0.12 0 0.35 0.51 0.10
14.5 1.10 2.22 0 2.29 0
14.8 1.67 3.18 2.67 0 0
18.0 2.63 5.19 3.10 2.74 0
APEX EX INV FSH BSH DSH
13.8 0 3.30 1.20 1.22 0.19
9.7 0.17 0 0.31 0.36 0.06
12.9 1.10 3.50 0 2.43 0
13.5 1.22 3.47 1.96 0 0
16.4 2.21 5.37 2.93 2.70 0
50/5 problem instances APEX 0 14.0 EX 0 0 INV 0 1.81 FSH 0 1.46 BSH 0 1.28 DSH 0 0.09
12.4 0.08 0 0.68 0.71 0.02
12.9 0.41 1.40 0 1.58 0
13.4 0.62 1.71 1.86 0 0
15.5 1.17 2.93 2.31 1.88 0
16.4 0.08 0 0.27 0.51 0.03
18.5 0.74 3.28 0 2.50 0
19.2 0.56 4.15 2.15 0 0
22.3 1.54 6.12 2.59 2.95 0
20/10 problem instances
0 0 0 0 0 0
50/10 problem instances APEX EX INV FSH BSH DSH
0 0 0 0 0 0
20.0 0 4.47 1.70 1.68 0.32
at least in terms of the precedence- and position-based metrics. (The adjacencybased metrics, as earlier, did not discriminate very eectively between dierent solutions, and were ignored.) After some eort, we were able to identify a `most widely separated set' S of global optima from the complete set of 2548. The average distance from the original global optimum of the 50 local optima generated by DSH was rst calculated (D ) for each metric. We then applied the criterion that the global optima belonging to the set S should be at least as far from each other as D (for each metric). Unfortunately, there were no members of S in this case! Clearly, all of these 2548 global optima were much closer to each other than they were to the average of the 50 local optima. We found it necessary to relax this criterion to 0:4D before a set of any size was found. With this value, set S had 13 members. We then measured the distances of each of these global optima from each of the 50 local optima using both the precedence- and position-based metrics. It was immediately apparent that, while the actual distances from the dierent global optima varied by roughly 10%, those local optima that were close to one global optimum were likely to be close to all the others. Conversely, those local optima that were far from one global optimum were far from them all. In fact, there appeared to be considerable agreement between the implied rankings of the local optima relative to the dierent global optima at all distances. Whether this apparent agreement was signi cant was tested using Kendall's coecient of concordance W [7]. For every operator, and for both precedenceand position-based metrics, the P -value was zero to at least 8 places of decimals; in other words, the hypothesis of no agreement between the dierent rankings was always decisively rejected. The implication of this analysis is clear: conclusions regarding distances from one global optimum for this instance can safely be maintained for other global optima, even for ones that are relatively quite far apart. There is also an interesting insight here into the nature of landscape in this instance, which points up possible dangers in our use of the metaphor. In our experience of a 3dimensional world, the picture conjured up by the term `big valley' is one where a global optimum is surrounded by local optima of progressively increasing average quality as they approach that global optimum. However, from the analysis of this case, the members of one set of quite widely-separated points (the global optima) all bear very nearly the same relationship to all the members of another set (the local optima). It is probable,
therefore, that the regions inhabited by the two sets of points are actually quite sharply divided. Perhaps the `valley' in which the local optima lie does not lead inexorably towards the set of global optima, but comes up against a `barrier' that separates the two groups. This shows an inherent danger in applying the term `landscape' as a description of what happens in combinatorial optimization. The idea of a `big valley' is a helpful metaphor, but we should not imagine it is like a valley in 3 dimensions. There is need for some caution here. This analysis relates to just one problem instance, and it cannot necessarily be extended to all instances. However, when the same approach was attempted for some of Taillard's other benchmarks, it proved impossible to generate more than one global optimum, other than those that were merely trivial modi cations of the global optimum used in section 3. In view of the computational eort already expended, we decided to abandon the attempt to verify the ndings on another problem instance. Possibly, Taillard's rst 20/5 problem is atypical both in the number of global optima it possesses, and in the relatively large distances between them. At least we can say that the analysis of this one problem instance has shown no reason to doubt the conclusions reached earlier.
5. Conclusion We have outlined an approach to the analysis of landscapes induced by some speci c operators in the context of owshop sequencing. We have discussed some suitable metrics for representing inter-sequence distances, and shown how the signi cance of distance correlations can be measured by a randomization test. (How well these metrics approximate the `true' distances between sequences is currently the subject of further experimentation. Preliminary results [12] suggest that both the precedence- and postion-based metrics are very good approximations.) Using these metrics, we have observed that a `big valley' structure does appear to exist for such problems as investigated here, which suggests the motivation for perturbation approaches is well-founded, and provides a sound explanation for the good performance of these and other methods. We also considered the question of whether these results might relate only to the one speci c global optimum used in each case. The analysis of a case where multiple global optima were common showed there was, and found no reason that they should not be generalized. This analysis also showed that there is a need for
caution in too readily transferring the aspects of familiar 3-D landscapes to those generated by neighbourhood search methods in combinatorial optimization. Finally, we return to the point which was made earlier|we have now seen that in some sense the local optima appear to form a `cluster' in the overall search space, yet 50 distinct re-starts still led to 50 distinct local optima Clearly therefore, although the local optima form a restricted subset, it is still a very large one. How many local optima there are is dicult to measure, but it is probably a function that is exponential in the problem size. Thus to explore a given number of local optima from random re-starts is likely to take much longer than to use perturbation methods or adaptive re-starts. This gives further reason for investigating such methods.
Acknowledgements The comments of the anonymous referees are gratefully acknowledged.
References [1] P.Baldi and E.B.Baum, Caging and exhibiting utrametric structures, in: Neural Networks for Computing, ed. J.Denker, American Institute of Physics, 1986. [2] K.D.Boese, A.B.Kahng and S.Muddu, A new adaptive multi-start technique for combinatorial global optimizations, Operations Research Letters 16 (1994) 101-113. [3] F.Glover and M.Laguna, Tabu Search. Chapter 3 in: Modern Heuristic Techniques for Combinatorial Problems, ed. C.R.Reeves, Blackwell Scienti c Publications, Oxford, 1993. (Recently re-issued (1995) by McGraw-Hill, London.) [4] C.Hohn and C.R.Reeves, The crossover landscape for the onemax problem, in: Proceedings of the 2nd Nordic Workshop on Genetic Algorithms and their Applications, ed. J.Alander, University of Vaasa Press, Vaasa, Finland, 1996, pp. 27-43. [5] C.Hohn and C.R.Reeves, Are long path problems hard for genetic algorithms? in: Parallel Problem-Solving from Nature|PPSN IV, ed. H-M Voigt, W.Ebeling, I.Rechenberg and H-P Schwefel, Springer-Verlag, Berlin, 1996, pp. 134-143. [6] D.S.Johnson, Local optimization and the traveling salesman problem. in: Automata, Languages and Programming, Lecture Notes in Computer Science 443, ed. G.Goos and J.Hartmanis, Springler-Verlag, Berlin, 1990, pp. 446-461. [7] M.G.Kendall, Rank Correlation Methods, Charles Grin, London, 1962. [8] S.Kirkpatrick and G.Toulouse, Con guration space analysis of travelling salesman problems, J.Physique 46 (1985), 1277-1292. [9] B.F.J.Manly, Randomization and Monte Carlo Methods in Biology, Chapman and Hall, London, 1991.
[10] N.Mantel, The detection of disease clustering and a generalized regression approach, Cancer Research 27 (1967), 209-220. [11] O.Martin, S.W.Otto and E.W.Felten, Large step Markov chains for the TSP incorporating local search heuristics, Operations Research Letters 11 (1992), 219-224. [12] C.R.Reeves and T.Yamada, Distance measures in the permutation owshop problem, Technical Report, School of Mathematical and Information Sciences, Coventry University, UK, 1997. [13] E.Taillard, Benchmarks for basic scheduling problems, European Journal of OR 64 (1993), 278-285. [14] T.Yamada and R.Nakano, Scheduling by genetic local search with multi-step crossover, in: Parallel Problem-Solving from Nature|PPSN IV, ed. H-M Voigt, W.Ebeling, I.Rechenberg and H-P Schwefel, Springer-Verlag, Berlin, 1996, pp. 960-969. [15] G.Zweig, An eective tour construction and improvement proedure for the traveling salesman problem, Operations Research 43 (1995), 1049-1057.