Determining Whether A Problem Characteristic

4 downloads 0 Views 447KB Size Report
This study focuses on Ant Colony System (ACS) [7] and Max-Min Ant. System (MMAS) [26] since the field frequently cites these as its best perform- ing algorithms ...
Determining Whether A Problem Characteristic Affects Heuristic Performance A rigorous Design of Experiments approach Enda Ridge and Daniel Kudenko The Department of Computer Science, The University of York, U.K. [email protected]; [email protected] Summary. This chapter presents a rigorous Design of Experiments (DOE) approach for determining whether a problem characteristic affects the performance of a heuristic. Specifically, it reports a study on the effect of the cost matrix standard deviation of symmetric Travelling Salesman Problem (TSP) instances on the performance of Ant Colony Optimisation (ACO) heuristics. Results demonstrate that for a given instance size, an increase in the standard deviation of the cost matrix of instances results in an increase in the difficulty of the instances. This implies that for ACO, it is insufficient to report results on problems classified only by problem size, as has been commonly done in most ACO research to date. Some description of the cost matrix distribution is also required when attempting to explain and predict the performance of these heuristics on the TSP. The study should serve as a template for similar investigations with other problems and other heuristics.

1 Introduction and Motivation Ant Colony Optimisation (ACO) algorithms [8] are a relatively new class of stochastic metaheuristic for typical Operations Research (OR) problems of discrete combinatorial optimisation. To date, research has yielded important insights into ACO behaviour and its relation to other heuristics. However, there has been no rigorous study of the relationship between ACO algorithms and the difficulty of problem instances. Specifically, in researching ACO algorithms for the Travelling Salesperson Problem (TSP), it has generally been assumed that problem instance size is the main indicator of problem difficulty. Cheeseman et al [4] have shown that there is a relationship between the standard deviation of the edge lengths of a TSP instance and the difficulty of the problem for an exact algorithm. This leads us to wonder whether the standard deviation of edges lengths may also have a significant effect on problem difficulty for the ACO heuristics. Intuitively, it would seem so. An integral component of the construct solutions phase of ACO algorithms depends on the relative lengths of edges in the TSP. These edge lengths are often stored

2

Enda Ridge and Daniel Kudenko

in a TSP cost matrix. The probability with which an artificial ant chooses the next node in its solution depends, among other things, on the relative length of edges connecting to the nodes being considered. This study hypothesises that a high variance in the distribution of edge lengths results in a problem with a different difficulty to a problem with a low variance in the distribution of edge lengths. This research question is important for several reasons. Current research on ACO algorithms for the TSP does not report the problem characteristic of standard deviation of edge lengths. Assuming that such a problem characteristic affects performance, this means that for instances of the same or similar sizes, differences in performance are confounded with possible differences in standard deviation of edge lengths. Consequently, too much variation in performance is attributed to problem size and none to problem edge length standard deviation. Furthermore, in attempts to model ACO performance, all important problem characteristics must be incorporated into the model so that the relationship between problems, tuning parameters and performance can be understood. With this understanding, performance on a new instance can be satisfactorily predicted given the salient characteristics of the instance. This study focuses on Ant Colony System (ACS) [7] and Max-Min Ant System (MMAS) [26] since the field frequently cites these as its best performing algorithms. This study uses the TSP. The difficulty of solving the TSP to optimality, despite its conceptually simple description, has made it a very popular problem for the development and testing of combinatorial optimisation techniques. The TSP “has served as a testbed for almost every new algorithmic idea, and was one of the first optimization problems conjectured to be ‘hard’ in a specific technical sense” [16, p. 37]. This is particularly so for algorithms in the Ant Colony Optimisation (ACO) field where ‘a good performance on the TSP is often taken as a proof of their usefulness’ [8, p. 65]. The study emphasises the use of established Design of Experiment (DOE) [19] techniques and statistical tools to explore data and test hypotheses. It thus addresses many concerns raised in the literature over the field’s lack of experimental rigour [9, 22, 13]. The designs and analyses from this paper can be applied to other stochastic heuristics for the TSP and other problem types that heuristics solve. In fact, there is an increasing awareness of the need for the experiment designs and statistical analysis techniques that this chapter illustrates [2]. The next Section gives a brief background on the study’s ACO algorithms, ACS and MMAS. Section 3 describes the research methodology. Sections 4 and Section 5 describe the results from the experiments. Related work is covered in Section 6. The chapter ends with its conclusions and directions for future work.

Whether A Problem Characteristic Affects Performance

3

2 Background Given a number of cities and the costs of travelling from any city to any other city, the Travelling Salesperson Problem (TSP) is the problem of finding the cheapest round-trip route that visits each city exactly once. This has application in problems of traffic routing and manufacture among others [17]. The TSP is best represented by a graph of nodes (representing cities) and edges (representing the costs of visiting cites). Ant Colony Optimisation algorithms are discrete combinatorial optimisation heuristics inspired by the foraging activities of natural ants. Broadly, the ACO algorithms work by placing a set of artificial ants on the TSP nodes. The ants build TSP solutions by moving between nodes along the graph edges. These movements are probabilistic and are influenced both by a heuristic function and the levels of a real-valued marker called a pheromone. Their movement decisions also favour nodes that are part of a candidate list, a list of the least costly cities from a given city. The iterated activities of artificial ants lead to some combinations of edges becoming more reinforced with pheromone than others. Eventually the ants converge on a solution. It is common practice to hybridise ACO algorithms with local search [14] procedures. This study focuses on ACS and MMAS as constructive heuristics and so omits any such procedure. This does not detract from the proposed experiment design, methodology and analysis. The interested reader is referred to a recent text [8] for further information on ACO algorithms.

3 Method This section describes the general experiment design issues relevant to this chapter. Others have discussed these in detail [15] for heuristics in general. Further information on Design of Experiments in general [19] and its adaptation for heuristic tuning [21] is available in the literature. 3.1 Response Variables Response variables are those variables we measure because they represent the effects which interest us. A good reflection of problem difficulty is the solution quality that a heuristic produces and so in this study, solution quality is the response of interest. Birattari [3] briefly discusses measures of solution quality. He dismisses the use of relative error since it is not invariant under some transformations of the problem, as first noted by Zemel [28]. An example is given of how an affine transformation1 of the distance between cities in the 1

An affine transformation is any transformation that preserves collinearity (i.e., all points lying on a line initially still lie on a line after transformation) and ratios of distances (e.g., the midpoint of a line segment remains the midpoint after transformation). Geometric contraction, expansion, dilation, reflection, rotation, and shear are all affine transformations.

4

Enda Ridge and Daniel Kudenko

TSP, leaves a problem that is essentially the same but has a different relative error of solutions. Birattari instead uses a variant of Zemel’s differential approximation measure [29] defined as: cde (c, i) =

c − c¯i − c¯i crnd i

(1)

where cde (c, i) is the differential error of a solution instance i with cost c, c¯i is the optimal solution cost and crnd is the expected cost of a random solution to i instance i. An additional feature of this Adjusted Differential Approximation (ADA) is that its value when applied to a randomly generated solution will be 1. The measure therefore indicates how good a method is relative to the most trivial random method. In this way it can be considered as incorporating a lower bracketing standard [5]. Nonetheless, percentage relative error from optimum is still the more popular solution quality measure and so this study records and analyses both measures. The measures will be highly correlated but it is worthwhile to analyse them separately and to see what effect the choice of response has on the study’s conclusions. Concorde [1] was used to calculate the optima of the instances. Expected values of random solutions were calculated as the average of 200 solutions generated by randomly permuting the order of cities to create tours. 3.2 Instances Problem instances were created using a modification of the portmgen generator from the DIMACS TSP challenge [11]. The original portmgen created a cost matrix by choosing edge lengths uniformly randomly within a certain range. We adjusted the generator so that edge costs could be drawn from any distribution. In particular, we followed Cheeseman et al ’s [4] approach and drew edge lengths from a Log-Normal distribution. Although Cheeseman et al did not state their motivation for using such a distribution, a plot of the normalised relative frequencies of the normalised edge costs of instances from a popular online benchmark library, TSPLIB [20], shows that the majority have a Log-Normal shape (Figure 1). An appropriate choice of inputs to our modified portmgen results in edges with a Log-Normal distribution and a desired mean and standard deviation. Figure 2 shows normalised relative frequencies of the normalised edge costs of three generated instances. Standard deviation of TSP edge lengths was varied across 5 levels: 10, 30, 50, 70 and 100. Three problem sizes; 300, 500 and 700 were used in the experiments. The same set of instances was used for the ACS and MMAS heuristics and the same instance was used for replicates of a design point.

Whether A Problem Characteristic Affects Performance

5

bier127 Normalised Relative Frequency

1.00 0.80 0.60 0.40 0.20 0.00 0.00

0.20 0.41 0.61 0.82 Normalised Edge Length

Normalised Relative Frequency

Fig. 1. Normalised relative frequency of normalised edge lengths of TSPLIB instance bier127. The normalised distribution of edge lengths demonstrates a characteristic Log-Normal shape. 1

Mean 100, StDev 70 Mean 100, StDev 30 Mean 100, StDev 10

0.5

0 0

0.2 0.4 0.6 Normalised Edge Lengths

0.8

Fig. 2. Normalised relative frequencies of normalised edge lengths for 3 instances of the same size and same mean cost. Instances are distinguished by their standard deviation.

3.3 Factors, Levels and Ranges There are two types of factors or variables thought to influence the responses. Design factors are those that we vary in anticipation that they have an effect on the responses. Held-constant factors may have an effect on the response but are not of interest and so are fixed at the same value during all experiments. Design Factors There were two design factors. The first was the standard deviation of edge lengths in an instance. This was a fixed factor, since its levels were set by

6

Enda Ridge and Daniel Kudenko

the experimenter. Five levels: 10, 30, 50, 70 and 100 were used. The second factor was the individual instances with a given level of standard deviation of edge lengths. This was a random factor since instance uniqueness was caused by the problem generator and so was not under the experimenter’s direct control. Ten instances were created within each level of edge length standard deviation. Held-constant Factors Computations for pheromone update were limited to the candidate list length (Section 2). Both problem size and edge length mean were fixed for a given experiment. The held constant tuning parameter settings for the ACS and MMAS heuristics are listed in Table 1. Parameter

Symbol

ACS

MMAS

Ants

m

10

25

Pheromone emphasis

α

1

1

Heuristic emphasis

β

2

2

15

20

Candidate List length Exploration threshold

q0

0.9

N/A

Pheromone decay

ρglobal

0.1

0.8

Pheromone decay

ρlocal

0.1

N/A

Solution construction

Sequential Sequential

Table 1. Parameter settings for the ACS and MMAS algorithms. Values are taken from the original publications [8, 26]. See these for a description of the tuning parameters and the MMAS and ACS heuristics.

These tuning parameter values were used because they are listed in the field’s main book [8] and are often adopted in the literature. It is important to stress that this research’s use of parameter values from the literature by no means implies support for such a ‘folk’ approach to parameter selection in general. Selecting parameter values as done here strengthens the study’s conclusions in two ways. It shows that results were not contrived by searching for a unique set of tuning parameter values that would demonstrate the hypothesised effect. Furthermore, it makes the research conclusions applicable to all other research that has used these tuning parameter settings without the justification of a methodical tuning procedure. Recall from the motivation (Section 1) that demonstrating an effect of edge length standard deviation on performance with even one set of tuning parameter values is sufficient to merit

Whether A Problem Characteristic Affects Performance

7

the factor’s consideration in parameter tuning studies. The results from this research have been confirmed by such studies [23, 25]. 3.4 Experiment design, power and replicates This study uses a two-stage nested (or hierarchical ) design. Consider this analogy. A company receives stock from several suppliers. They test the quality of this stock by taking 10 samples from each supplier’s batch. They wish to determine whether there is a significant overall difference in supplier quality and whether there is a significant quality difference in samples within a supplier’s batch. Supplier and sample are factors. A full factorial design2 of the supplier and sample factors is inappropriate because samples are unique to their supplier. The nested design accounts for this uniqueness by grouping samples within a given level of supplier. Supplier is termed the parent factor and batches are the nested factor. A similar situation arises in this research. An algorithm encounters TSP instances with different levels of standard deviation of cost matrix. We want to determine whether there is a significant overall difference in algorithm solution quality for different levels of standard deviation. We also want to determine whether there is a significant difference in algorithm quality between instances that have the same standard deviation. Figure 3 illustrates the two-stage nested design schematically.

1

Parent Factor

Nested Factor

Observ ations

2

1

2

3

4

5

6

y111

y121

y131

y241

y251

y261

y112

y122

y132

y242

y252

y262

y11r

y12r

y13r

y24r

y25r

y26r

Fig. 3. Schematic for the Two-Stage Nested Design with r replicates. (adapted from [19]). Note the nested factor numbering to emphasise the uniqueness of the nested factor levels within a given level of the parent factor.

The standard deviation of the generated instance is the parent factor and the individual instance number is the nested factor. Therefore, an individual 2

A full factorial design is one in which all levels of all factors are completely crossed with one another.

8

Enda Ridge and Daniel Kudenko

treatment consists of running the algorithm on a particular instance generated with a particular standard deviation. This design applies to an instance of a given size and therefore cannot capture possible interactions between instance size and instance standard deviation. Capturing such interactions would require a more complicated crossed nested design. This research uses the simpler design to demonstrate that standard deviation is important for a given size. Interactions are captured in the more complicated designs used in tuning ACS [23] and MMAS [25]. The heuristics are probabilistic and so repeated runs with identical inputs (instances, parameter settings etc.) will produce different results. All treatments are thus replicated in a work up procedure [6]. This involved adding replicates to the designs until sufficient power of 80% was reached given the study’s significance level of 1% and the variability of the data collected. Power was calculated with Lenth’s power calculator [18]. For all experiments, 10 replicates were sufficient to meet these requirements. 3.5 Performing the experiment Randomised run order Available computational resources necessitated running experiments across a variety of similar machines. Runs were executed in a randomised order across these machines to counteract any uncontrollable nuisance factors3 . While such randomising is strictly not necessary when measuring a machine-independent response, it is good practice nonetheless. Stopping Criterion The choice of stopping criterion for an experiment run is difficult when heuristics can continue to run and improve indefinitely. CPU time is not encouraged by some as a reproducible metric [15] and so some time independent metric such as a combinatorial count of an algorithm operation is often used. A problem with this approach is that our choice of combinatorial count can bias our results. Should we stop after 1000 iterations or 1001? We mitigate this concern by taking evenly spaced measurements over 5000 iterations of the algorithms and separately analysing the data at all measurements. Note that a more formal detection of possible differences introduced by different stopping criteria would have required a different experiment design.

3

A nuisance factor is one that we know affects the responses but we have no control over it. Two typical examples in experiments with heuristics are network traffic and background CPU processes.

Whether A Problem Characteristic Affects Performance

9

4 Analysis The two-stage nested designs were analysed with the General Linear Model. Standard deviation was treated as a fixed factor since we explicitly chose its levels and instance was treated as a random factor. The technical reasons for this decision in the context of experiments with heuristics have recently been well explained in the heuristics literature [2]. 4.1 Transformations To make the data amenable to statistical analysis, a transformation of the responses was required for each analysis. The transformations were either a log10 transformation, inverse square root transformation or a square root transformation. 4.2 Outliers An outlier is a data value that is either unusually large or unusually small relative to the rest of the data. Outliers are important because their presence can distort data and render statistical analyses inaccurate. There are several approaches to dealing with outliers. In this research, outliers were deleted and the model building repeated until the models passed the usual ANOVA diagnostics for the ANOVA assumptions of model fit, normality, constant variance, time-dependent effects and leverage. Figure 4 lists the number of outliers deleted during the analysis of each experiment.

Algorithm ACS

MMAS

Problem Relative ADA size Error 300 3 3 500 0 2 700 5 4 300 3 7 500 1 2 700 4 2

Fig. 4. Number of outliers deleted during the analysis of each experiment. Each experiment had a total of 500 data points.

Further details on the analyses and diagnostics are available in many textbooks [19].

10

Enda Ridge and Daniel Kudenko

5 Results Figures 5 to 8 illustrate the results for both algorithms on problems of size 300 and size 700. Problems of size 500 showed the same trend and so are omitted. In all cases, the effect of Standard Deviation on solution quality was deemed statistically significant at the p < 0.01 level. The effect of instance was also deemed statistically significant. However, an examination of the plots that follow shows that only Standard Deviation has a practically significant effect. In each of these box-plots, the horizontal axis shows the standard deviation of the instances’ cost matrices at five levels. This is repeated along the horizontal axis at the measurement points used. The vertical axis shows the solution quality response in its original scale. There is a separate plot for each algorithm and each problem size. Outliers have been included in the plots. An increase in problem standard deviation gives an increase in the relative error of solutions produced. Instances with a higher standard deviation are more difficult to solve. Conclusions from the data were the same at all measurement points.

Relative Error, ACS, size 300, mean 100 14

12

10

8

6

4

2

0 A:problemStDev

10 30 50 70 100 RelErr 1000

10 30 50 70 100 RelErr 2000

10 30 50 70 100 RelErr 3000

10 30 50 70 100 RelErr 4000

10 30 50 70 100 RelErr 5000

Fig. 5. Results of ACS applied to problems of size 300.

The opposite trend was observed in the Adjusted Differential Approximation response. Figure 9 illustrates one such trend. The ADA value decreases as edge length standard deviation increases. This can be understood by examining a plot of the components of ADA (Equation (1)), the absolute solution value, optimal solution value and expected random solution value (Section

Whether A Problem Characteristic Affects Performance Relative Error, ACS, size 700, mean 100 18 16 14 12 10 8 6 4 2 0 A:problemStDev

10 30 50 70 100 Rel Err 1000

10 30 50 70 100 Rel Err 2000

10 30 50 70 100 Rel Err 3000

10 30 50 70 100 Rel Err 4000

10 30 50 70 100 Rel Err 5000

Fig. 6. Results of ACS applied to problems of size 700. Relative Error, MMAS, size 300, mean 100 20

15

10

5

A:problemStDev

10 30 50 70 100 Rel Err 1000

10 30 50 70 100 Rel Err 2000

10 30 50 70 100 Rel Err 3000

10 30 50 70 100 Rel Err 4000

10 30 50 70 100 Rel Err 5000

Fig. 7. Results of MMAS applied to problems of size 300.

11

12

Enda Ridge and Daniel Kudenko Relative Error MMAS, size 700, mean 100 25

20

15

10

5

A:problemStDev

10 30 50 70 100 Rel Err 1000

10 30 50 70 100 Rel Err 2000

10 30 50 70 100 Rel Err 3000

10 30 50 70 100 Rel Err 4000

10 30 50 70 100 Rel Err 5000

Fig. 8. Results of MMAS applied to problems of size 700.

3.1). While increasing standard deviation has no effect on the expected random solution, it has a strong effect on both the optimal and absolute solution values. As standard deviation increases, the ACO heuristics’ performance deteriorates, coming closer to the performance of a random solution.

ADA, ACS, size 300, mean 100 5

4

3

2

1

0 A:problemStDev

10 30 50 70 100 ADA 1000

10 30 50 70 100 ADA 2000

10 30 50 70 100 ADA 3000

10 30 50 70 100 ADA 4000

10 30 50 70 100 ADA 5000

Fig. 9. Results of ACS applied to problems of size 300

Whether A Problem Characteristic Affects Performance

absolute solution value

35,000

13

Expected Random

30,000

Optimum 25,000 Absolute

20,000 15,000 10,000 5,000 0 30

40

50

60

70

80

90

100

standard deviation of edge lengths

Fig. 10. Plots of the values affecting ADA calculation. The values of absolute solution and optimum solution decrease with increasing standard deviation of edge lengths. The expected random solution value remains unchanged.

6 Related Work There has been some related work on problem difficulty for exact and heuristic algorithms. Cheeseman et al [4] investigated the effect of cost matrix standard deviation on the difficulty of Travelling Salesperson Problems for an exact algorithm. Three problem sizes of 16, 32 and 48 were investigated. For each problem size, many instances were generated such that each instance had the same mean cost but a varying standard deviation of cost. This varying standard deviation followed a Log-Normal distribution. The computational effort for an exact algorithm to solve each of these instances was measured and plotted against the standard deviation of cost matrix. This study differs from that of Cheeseman et al [4] in that it uses larger problem sizes and a heuristic algorithm rather than exact algorithm. Furthermore, its conclusions are reinforced with a rigorous DOE approach and statistical analyses. Fischer et al [10] investigated the influence of Euclidean TSP structure on the performance of two algorithms, one exact and one heuristic. The former was branch-and-cut [1] and the latter was the iterated Lin-Kernighan algorithm [12]. In particular, the TSP structural characteristic investigated was the distribution of cities in Euclidean space. The authors varied this distribution by taking a structured problem instance and applying an increasing perturbation to the city distribution until the instance resembled a randomly distributed problem. There were two perturbation operators. A reduction operator removed between 1% to 75% of the cities in the original instance.

14

Enda Ridge and Daniel Kudenko

A shake operator offset cities from their original location. Using 16 original instances, 100 perturbed instances were created for each of 8 levels of the perturbation factor. Performance on perturbed instances was compared to 100 instances created by uniformly randomly distributing cities in a square. Predictably, increased perturbation lead to increased solution times that were closer to the times for a completely random instance of the same size. It was therefore concluded that random Euclidean TSP instances are relatively hard to solve compared to structured instances. The reduction operator confounded changed problem structure with a reduction in problem size, a known factor in problem difficulty. This paper fixes problem size and cost matrix mean and controls cost matrix standard deviation, thus avoiding any such confounding. Most recently, Van Hemert [27] evolved problem instances of a fixed size that were difficult to solve for two heuristics: Chained Lin-Kernighan and Lin Kernighan with Cluster Compensation. TSP instances of size 100 were created by uniform randomly selecting 100 coordinates from a 400x400 grid. This seems to be a similar generation approach to the portgen generator from the DIMACS TSP challenge. An initial population of such instances was evolved for each of the algorithms where higher fitness was assigned to instances that required a greater effort to solve. This effort was a combinatorial count of the algorithms’ most time-consuming procedure. Van Hemert then analysed the evolved instances using several interesting metrics. His aim was to determine whether the evolutionary procedure made the instances more difficult to solve and whether that difficulty was specific to the algorithm. To verify whether difficult properties were shared between algorithms, each algorithm was run on the other algorithm’s evolved problem set. A set evolved for one algorithm was less difficult for the other algorithm. However, the alternative evolved set still required more effort than the random set indicating that some difficult instance properties were shared by both evolved problem sets. Our approach began with a specific hypothesis about a single problem characteristic and its effect on problem hardness. Van Hemert’s, by contrast, evolved hard instances and then attempted to infer, post-hoc, which characteristics might be responsible for that hardness. If the researcher’s aim is to stress test a heuristic, then we believe Van Hemert’s approach is more appropriate. The approach presented here is appropriate when isolating a specific problem characteristic that may affect problem hardness. To our knowledge, this chapter presents the first rigorous experiments on the hardness of problem instances for ACO heuristics. Recent results in screening factors affecting ACO performance [24] and on modelling the parameter, problem instance and performance relationship [23, 25] confirm the results of this study.

Whether A Problem Characteristic Affects Performance

15

7 Conclusions Our conclusions from the aforementioned results are as follows. For the St¨ utzle implementations of ACS and MMAS, applied to symmetric TSP instances generated with log-normally distributed edge lengths such that all instances have a fixed cost matrix mean of 100 and a cost matrix standard deviation varying from 10 to 100: 1. a change in cost matrix standard deviation leads to a statistically and practically significant change in the difficulty of the problem instances for these algorithms. 2. there is no practically significant difference in difficulty between instances that have the same size, cost matrix mean and cost matrix standard deviation. 3. there is no practically significant difference between the difficulty measured after 1000 algorithm iterations and 5000 algorithm iterations. 4. conclusions were the same for the relative error and Adjusted Differential Approximation responses. 7.1 Implications These results are important for the ACO community for the following reasons: •







They demonstrate in a rigorous, designed experiment fashion, that quality of solution of an ACO TSP algorithm is affected by the standard deviation of the cost matrix. They demonstrate that cost matrix standard deviation must be considered as a factor when building predictive models of ACO TSP algorithm performance. They clearly show that performance analysis papers using ACO TSP algorithms must report instance cost matrix standard deviation as well as instance size since two instances with the same size can differ significantly in difficulty. They motivate an improvement in benchmarks libraries so that they provide a wider crossing of both instance size and instance cost matrix standard deviation.

7.2 Assumptions and restrictions For completeness and for clarity, we state that this research does not examine the following issues. •

It did not examine clustered problem instances or grid problem instances. These are other common forms of TSP in which nodes appear in clusters and in a very structured grid pattern respectively. The conclusions should not be applied to other TSP types without a repetition of this case study.

16







Enda Ridge and Daniel Kudenko

Algorithm performance was not being examined since no claim was made about the suitability of the parameter values for the instances encountered. Rather, the aim was to demonstrate an effect for standard deviation and so argue that it should be included as a factor in experiments that do examine algorithm performance. We cannot make a direct comparison between algorithms since algorithms were not tuned methodically. That is, we are not entitled to say that ACS did better than MMAS on, say, instance X with a standard deviation of Y. We cannot make a direct comparison of the response values for different sized instances. Clearly, 3000 iterations explores a bigger fraction of the search space for 300-city problems than for 500 city problems. Such a comparison could be made if it was clear how to scale iterations with problem size. Such scaling is an open question.

8 Future Work There are several avenues of future work leading from this study. The same analysis is worthwhile for other popular ACO algorithms. The code provided by St¨ utzle also has implementations of Best-Worst Ant System, Ant System, Rank-Based Ant System and Elitist Ant System. The use of a well-established Design of Experiments approach with two-stage nested designs and analysis with the General Linear Model could also be applied to other heuristics for the TSP. It is important that we introduce such rigour into the field so that we can move from the competitive testing of highly engineered designs to the scientific evaluation of hypotheses about algorithms. Recall that we have used fixed algorithm parameter settings from the literature. Screening experiments and Response Surface models have since established which of these parameters interact with cost matrix standard deviation to affect performance [23, 25]. Further Design of Experiments studies are needed to confirm these results and to investigate similar research questions for other heuristics.

References 1. D. Applegate, R. Bixby, V. Chvatal, and W. Cook. Implementing the DantzigFulkerson-Johnson algorithm for large traveling salesman problems. Mathematical Programming Series B, 97(1-2):91–153, 2003. 2. J. Bang-Jensen, M. Chiarandini, Y. Goegebeur, and B. Jørgensen. Mixed Models for the Analysis of Local Search Components. In Engineering Stochastic Local Search Algorithms. Designing, Implementing and Analyzing Effective Heuristics, volume 4638 of Lecture Notes in Computer Science, pages 91–105. Springer, Berlin / Heidelberg, 2007.

Whether A Problem Characteristic Affects Performance

17

3. M. Birattari. The Problem of Tuning Metaheuristics. PhD, Facult´e des Sciences Appliqu´ees, Universit´e Libre de Bruxelles, 2006. 4. P. Cheeseman, B. Kanefsky, and W. M. Taylor. Where the Really Hard Problems Are. In Proceedings of the Twelfth International Conference on Artificial Intelligence, volume 1, pages 331–337. Morgan Kaufmann Publishers, Inc., USA, 1991. 5. P. R. Cohen. Empirical Methods for Artificial Intelligence. The MIT Press, Cambridge, Massachusetts, 1995. 6. A. Czarn, C. MacNish, K. Vijayan, B. Turlach, and R. Gupta. Statistical Exploratory Analysis of Genetic Algorithms. IEEE Transactions on Evolutionary Computation, 8(4):405–421, 2004. 7. M. Dorigo and L. M. Gambardella. Ant Colony System: A Cooperative Learning Approach to the Traveling Salesman Problem. IEEE Transactions on Evolutionary Computation, 1(1):53–66, 1997. 8. M. Dorigo and T. St¨ utzle. Ant Colony Optimization. The MIT Press, Massachusetts, USA, 2004. 9. A. Eiben and M. Jelasity. A critical note on experimental research methodology in EC. In Proceedings of the 2002 IEEE Congress on Evolutionary Computation, pages 582–587. IEEE, 2002. 10. T. Fischer, T. St¨ utzle, H. Hoos, and P. Merz. An Analysis Of The Hardness Of TSP Instances For Two High Performance Algorithms. In Proceedings of the Sixth Metaheuristics International Conference, pages 361–367. 2005. 11. M. Goldwasser, D. S. Johnson, and C. C. McGeoch, editors. Proceedings of the Fifth and Sixth DIMACS Implementation Challenges. American Mathematical Society, 2002. 12. K. Helsgaun. An effective implementation of the Lin-Kernighan traveling salesman heuristic. European Journal of Operational Research, 126(1):106–130, 2000. 13. J. N. Hooker. Testing heuristics: We have it all wrong. Journal of Heuristics, 1:33–42, 1996. 14. H. Hoos and T. St¨ utzle. Stochastic Local Search, Foundations and Applications. Morgan Kaufmann, 2004. 15. D. S. Johnson. A Theoretician’s Guide to the Experimental Analysis of Algorithms. In Proceedings of the Fifth and Sixth DIMACS Implementation Challenges, pages 215–250. American Mathematical Society, 2002. 16. D. S. Johnson and C. H. Papadimitriou. Computational Complexity. In E. L. Lawler, J. K. Lenstra, A. H. G. R. Kan, and D. B. Shmoys, editors, The Traveling Salesman Problem, Wiley Series in Discrete Mathematics and Optimization, pages 37–85. John Wiley and Sons, 1995. 17. E. L. Lawler, J. K. Lenstra, A. H. G. R. Kan, and D. B. Shmoys, editors. The Traveling Salesman Problem - A Guided Tour of Combinatorial Optimization. Wiley Series in Discrete Mathematics and Optimization. John Wiley and Sons, New York, USA. 18. R. V. Lenth. Java Applets for Power and Sample Size. 2006. 19. D. C. Montgomery. Design and Analysis of Experiments. John Wiley and Sons Inc, 6 edition, 2005. 20. G. Reinelt. TSPLIB - A traveling salesman problem library. ORSA Journal of Computing, 3:376–384, 1991. 21. E. Ridge. Design of Experiments for the Tuning of Optimisation Algorithms. PhD thesis, Department of Computer Science, The University of York, 2007.

18

Enda Ridge and Daniel Kudenko

22. E. Ridge and E. Curry. A Roadmap of Nature-Inspired Systems Research and Development. Multi-Agent and Grid Systems, 3(1), 2007. 23. E. Ridge and D. Kudenko. Analyzing Heuristic Performance with Response Surface Models: Prediction, Optimization and Robustness. In Proceedings of the Genetic and Evolutionary Computation Conference, volume 1, pages 150– 157. ACM, 2007. 24. E. Ridge and D. Kudenko. Screening the Parameters Affecting Heuristic Performance. In Proceedings of the Genetic and Evolutionary Computation Conference, volume 1. ACM, 2007. 25. E. Ridge and D. Kudenko. Tuning the Performance of the MMAS Heuristic. In T. St¨ utzle, M. Birattari, and H. Hoos, editors, Engineering Stochastic Local Search Algorithms. Designing, Implementing and Analyzing Effective Heuristics, volume 4638 of Lecture Notes in Computer Science, pages 46–60. Springer, Berlin / Heidelberg, 2007. 26. T. St¨ utzle and H. H. Hoos. Max-Min Ant System. Future Generation Computer Systems, 16(8):889–914, 2000. 27. J. I. van Hemert. Property Analysis of Symmetric Travelling Salesman Problem Instances Acquired Through Evolution. In Proceedings of the Fifth Conference on Evolutionary Computation in Combinatorial Optimization, volume 3448 of Lecture Notes in Computer Science, pages 122–131. Springer-Verlag, Berlin, 2005. 28. E. Zemel. Measuring the quality of approximate solutions to zero-one programming problems. Mathematics of Operations Research, 6:319–332, 1981. 29. M. Zlochin and M. Dorigo. Model based search for combinatorial optimization: a comparative study. In Proceedings of the Seventh International Conference on Parallel Problem Solving from Nature, volume 2439 of Lecture Notes in Computer Science, pages 651–661. Springer-Verlag, Berlin, Germany, 2002.