Int. J. Bio-Inspired Computation, Vol. 1, No. 4, 2009
217
Central force optimisation: a new gradient-like metaheuristic for multidimensional search and optimisation Richard A. Formato P.O. Box 1714 Harwich, MA 02645, USA E-mail:
[email protected] Abstract: This paper introduces central force optimisation, a novel, nature-inspired, deterministic search metaheuristic for constrained multidimensional optimisation in highly multimodal, smooth, or discontinuous decision spaces. CFO is based on the metaphor of gravitational kinematics. The algorithm searches a decision space by ‘flying’ its ‘probes’ through the space by analogy to masses moving through physical space under the influence of gravity. Equations are developed for the probes’ positions and accelerations using the gravitational metaphor. Small objects in our universe can become trapped in close orbits around highly gravitating masses. In ‘CFO space’ probes are attracted to ‘masses’ created by a user-defined function of the value of an objective function to be maximised. CFO may be thought of in terms of a vector ‘force field’ or, loosely, as a ‘generalised gradient’ methodology because the force of gravity can be computed as the gradient of a scalar potential. The CFO algorithm is simple and easily implemented in a compact computer program. Its effectiveness is demonstrated by running CFO against several widely used benchmark functions. The algorithm exhibits very good performance, suggesting that it merits further study. Keywords: central force optimisation; CFO; optimisation; multidimensional search; deterministic metaheuristic; generalised gradient; bio-inspired; nature-inspired; evolutionary algorithm. Reference to this paper should be made as follows: Formato, R.A. (2009) ‘Central force optimisation: a new gradient-like metaheuristic for multidimensional search and optimisation’, Int. J. Bio-Inspired Computation, Vol. 1, No. 4, pp.217–238. Biographical notes: Richard A. Formato received his JD from Suffolk University Law School; his PhD and MS degrees from the University of Connecticut and his MSEE and BS (Physics) degrees from Worcester Polytechnic Institute. In the early 1990s, he began applying genetic algorithms to design problems in applied electromagnetics. CFO is an outgrowth of that work and of his continuing interest in nature-inspired and evolutionary algorithms.
1
Introduction
Central force optimisation (CFO) is a new nature-inspired metaheuristic for multidimensional search and optimisation. CFO is based on the metaphor of gravitational kinematics, the branch of Physics that deals with the motion of masses moving under the influence of gravity. Because the equations of motion are deterministic, so too is CFO. This important characteristic sets CFO apart from those biology-inspired metaheuristics that are inherently stochastic. Nature-inspired search and optimisation algorithms have proven very effective, but often they are based on analogies drawn from biology, which inevitably leads to uncertainty. Biological metaphors mimic the randomness inherent in natural biological systems, thereby reflecting their genesis in computed results that vary from one run to the next. Assessing the performance of a biology-based algorithm requires a statistical analysis. For example, ant colony optimisation (ACO) (Dorigo et al., 1991) and particle
Copyright © 2009 Inderscience Enterprises Ltd.
swarm optimisation (PSO) (Kennedy and Eberhart, 1995), both of which are highly developed and widely used, require randomness at each step in their calculations. Because every ACO or PSO run yields different results, an assessment of their performance necessarily involves calculating means and standard deviations over very many runs. CFO, by contrast, provides the significant advantage of always producing precisely the same result from runs with the same initial parameters. The law of gravitation on which CFO is based is completely deterministic, so that the algorithm itself is inherently deterministic. This paper develops CFO theory and applies a simple CFO implementation to some representative benchmark functions. The purposes of this paper are to introduce CFO as a conceptual framework, that is, a ‘metaheuristic’ and to illustrate its effectiveness with selected examples. This paper is not intended to provide an exhaustive study of CFO’s effectiveness against a large suite of benchmarks or to present a methodology for choosing CFO run parameters.
218
R.A. Formato
CFO is presented on an empirical basis because at this point the algorithm is in its infancy. Presently there is no deep theoretical foundation or proof of convergence. Indeed, it is the author’s hope that this and the other CFO papers (Formato, 2006, 2007, 2008) will generate sufficient interest that the concept will be further developed by researchers far better qualified than he.
2
The CFO metaphor
All material objects in the universe are attracted by the force of gravity. The formal statement of this principle is Newton’s universal law of gravitation. Gravity is a vector force field embodying the notion of ‘action at a distance’. Any two masses, regardless of how large or how small or how close or far apart, are drawn to each other by a force acting along the line joining their centres of mass (a ‘central force’) whose magnitude is proportional to the product of the masses and inversely proportional to the square of the separation distance (see generally Goldstein, 1965; Marion, 1970). This principle is at the heart of the CFO metaphor. Imagine an object whose mass is small compared to the earth’s moving through space on a trajectory that brings it close to our planet. Under certain conditions the object can become gravitationally trapped, so that its orbit is modified by the encounter, at least for a while. In the absence of energy dissipation, orbital changes can occur for some period of time while the object and the earth exchange energy in a conservative manner. The small object may become ‘trapped’ in proximity to the earth, and it is this effect that the CFO metaphor embraces. Of course, CFO is an algorithm, a step-by-step procedure for processing numbers. It is not literally, nor is it intended to be, a precise model of how a small mass moves through space on a path that brings it close to a planet. Indeed, the problem of calculating the motion of even three gravitating bodies remains unsolved to this day, after nearly 250 years [Marion, (1970), §8.12]! CFO is a ‘conceptual’ approach to multidimensional search and optimisation. It draws its inspiration from gravitational kinematics and, in a formal way reflects the equations underlying gravitational motion. But the similarity ends there, and the CFO algorithm designer has wide latitude in modifying how calculations are performed in order to improve performance. CFO, like most other optimisation algorithms, includes several free parameters that are available to adjust how well it works. They are described in the Appendix, where the general CFO theory is developed, along with the equations and pseudocode for the specific CFO implementation used here. In this paper, CFO is formulated in terms of the vector equations inspired by the gravitational metaphor. But CFO also lends itself to a more purely mathematical interpretation. Because the gravitational field is a conservative force field with an inverse square distance dependence, the force of gravity can be expressed as the gradient of a scalar potential function. Thus, some CFO implementations could be considered a type of gradient-like
optimisation using a ‘generalised gradient’ or generalised ‘directional derivative’. How appropriate this interpretation depends on the definition of ‘mass’ in CFO space. In the model developed in the Appendix, for example, the CFO exponents are α = 2, β = 2. With these values the ratio of the difference of objective function fitnesses multiplied by the unit step (CFO ‘mass’) divided by the distance between probes looks a lot like the square of a directional derivative. Of course, this ratio does not in fact represent a slope because of the unit step function. Nevertheless, in this case the notion of a ‘generalised gradient’ is clear. But with different values of α and β, or with an entirely different mass definition, the similarity to a true derivative might be far less compelling; hence, the descriptor ‘gradient-like’. The vector model always applies as an analogy and in that sense is more general. However, the ‘generalised gradient’ notion embodies an aspect of how CFO works that perhaps is worth emphasising in the nomenclature. In the end, of course, either (or neither) interpretation can be used because, after all, CFO is an algorithm that is metaphorical in nature. The real question is whether or not it works, not what it should be called. In what follows, several widely used benchmark functions are used to test CFO’s performance. They have been chosen for the purpose of illustrating CFO’s behaviour. The optimiser source code is based on the pseudocode in Appendix A.3. CFO employs seven user-specified parameters: Nt, Np, G, α, β, Δt, Frep (see the Appendix for definitions). Note that the initial probe acceleration, which is assumed to be zero in this paper, could be used as an eighth parameter if desired. If G and Δt are constant, then they combine multiplicatively in a single constant coefficient. For most CFO runs and for all the runs reported here, the CFO exponents α, β are set to 2 because this value seems to provide good results for many groups of functions. Consequently, these exponents may not require much tweaking to achieve good results over a wide range of objective functions. In marked contrast, the parameters Np and Frep make a big difference in how CFO performs, so that assigning their values is an important aspect of setting up a run and largely an open question. As a practical matter, CFO’s performance seems to depend mostly on these two parameters. Another important non-numerical factor is the initial probe distribution, because that too, can make a big difference in the algorithm’s performance by how well it samples the decision space topology.
3
30D modified Griewank
The 30-dimensional modified Griewank function is defined as r 1 f ( x) = − 4000
30
30
i =1
i =1
⎛ xi − x0 ⎞ ⎟ −1 i ⎠ ⎝
2 ∑ ( xi − x0 ) + ∏ cos ⎜
where –600 ≤ xi ≤ 600, x0 = 75.123. The Griewank’s global maximum value is zero at xi = 75.123, i = 1,..,30. This function is extremely multimodal and one of the most
Central force optimisation: a new gradient-like metaheuristic for multidimensional search and optimisation challenging benchmark functions because the number of local maxima increases exponentially with increasing decision space dimensionality (Hamdan, 2008). In addition, offsetting the maximum from the origin (x0 = 0) to a substantially distant point (x0 = 75.123) makes it even more difficult to locate the global maximum (it appears that an offset is uncommon in the literature). The Griewank’s complexity is illustrated by the two-dimensional version plotted in Figure 1 in a truncated region around the maximum. Figure 1
2D Griewank near the global maximum (see online version for colours)
A 30-dimensional CFO run was made with the following empirically determined parameters: Nt = 1,000, Np = 780, G = 2, α = 2, β = 2, Δt = 1, Frep = 0.5 (defined in the Appendix). The initial acceleration was set to zero with 26 probes distributed uniformly along each of the 30 coordinate axes (Nd = 30, Np = 780). Deploying the initial probes on-axis avoids biasing CFO toward the global maximum because the maximum is located on the decision space principal diagonal well away from the coordinate axes. Table 1 Step #
30D Griewank best fitness Fitness
Neval
0
–41.9175605
780
1
–41.9175605
1,560
2
–19.8660752
2,340
3
–0.8819751
3,120
4
–0.0654384
3,900
5
–0.0459761
4,680
25
–0.0171346
20,280
27
–0.0135268
21,840
28
–0.0042324
22,620
48
–0.0040951
38,220
135
–0.0034489
106,080
151*
–0.0030385
118,560
1,000
–0.0030385
780,780
219
Figure 2 plots the evolution of the Griewank’s best fitness as a function of time step through Step 250. Table 1 lists the computed values at time steps where the fitness changed (note, no change between Steps 0 and 1 because the initial acceleration is zero). Neval = Np(Nt + 1) is the total number of function evaluations. The initial fitness increase is extremely rapid, reaching a value of –0.0459761 after only five time steps. Thereafter the increase is much slower, reaching –0.0042324 at Step 28, –0.0040951 at Step 48 and –0.0034489 at Step 135. At Step 151 the fitness saturates at –0.0030385 (marked by *) and remains there for the remainder of the run through Step 1,000. The run was terminated at Step 1,000 because it appeared unlikely there would be any further significant improvement in the fitness. The algorithm’s behaviour in this case appears to be representative of how CFO works in highly multimodal decision spaces. CFO frequently converges very quickly to the vicinity of the global maximum and thereafter the fitness improvement is very slow. This characteristic could be useful in pointing CFO, or another optimiser, to the general location of global maxima. It is instructive to compare CFO’s convergence in this case to another state-of-the-art algorithm. For example, Hamdan (2008) reports that PSO using ring neighbourhood topology, 50 particles and 1,000 generations averaged over 30 runs returns a maximum fitness of –0.001738 with a standard deviation σ = ±0.004388 [note: using this paper’s notation, i.e., –f(x), because ring-PSO did minimisation]. Achieving this level of accuracy required a total of 1,500,000 function evaluations and a highly refined PSO implementation. CFO, by contrast, reached a repeatable maximum of –0.0030385 with only 118,650 function evaluations, 92% fewer than PSO. While the average PSO result was more accurate, on any given run PSO might perform considerably worse in view of the large standard deviation. Another difference is that the Griewank’s maximum was offset for CFO but not for PSO, which may affect how well PSO would perform on the modified Griewank. A measure of how well CFO probes converge on a maximum is the average distance between the probe with the best fitness and all other probes. Figure 3 plots this distance normalised to the size of the decision space, that is, Davg
1 = L ⋅ ( N p − 1)
Np
Nd
p =1
i =1
∑ ∑ [ R( p, i, j) − R( p*, i, j)]
2
where p* is the number of the probe with the best fitness; L=
Nd
∑ (x
max i
− ximin ) 2 is the length of the decision space
i =1
principal diagonal; ximin ≤ xi ≤ ximax , i = 1,.., N d , define the minimum/maximum values of each coordinate; Davg is the average distance, and other variable definitions appear in Appendix A.1.
220
R.A. Formato
Figure 2
30D Griewank best fitness vs. time step (see online version for colours)
Figure 3
30D Griewank Davg vs. time step (see online version for colours)
by a small near-earth object (NEO). If so, it reinforces CFO’s metaphor of gravitational kinematics, and it serves as a harbinger of local trapping. The combination of plateaus with oscillation is remarkably similar to some of the curves seen in resonant gravitational trapping during encounters with NEOs such as asteroids. Figure 4 reproduces with permission the graph in Figure 2 on page 9 of Schweickart et al. (2006). It plots the required velocity change Δv versus time to modify asteroid Apophis’ trajectory in a near-earth encounter. The curve is based on the analysis of gravitational ‘resonant returns’ developed in Valsecchi et al. (2003), which addresses the significant modifications to a small object’s orbit that may result from gravitational perturbations in a close planetary encounter. The structural similarity of the curves in the Griewank’s Davg plot and Schweickart’s Δv plot is striking. Even though these curves represent different quantities, it is difficult to imagine that their similarity is purely accidental. Rather, it seems reasonable to surmise that both curves are different manifestations of the same phenomenon, namely, gravitational trapping. While this conclusion is admittedly speculative, the similarity of these plots seems to be a reasonable basis for exploring the applicability to CFO of some of the techniques used in celestial mechanics. At a minimum, the similarity arguably supports the view that CFO’s gravitational analogy is a sound basis for searching a decision space for an objective function’s maxima. Figure 4
Comparing Figures 2 and 3, through about Step 4, while the Griewank’s fitness is increasing very quickly, the average distance from the best probe to all others decreases just as fast. After Step 5 the fitness increases very, very slowly (the curve is essentially flat), but the average distance exhibits a wide variability. Between Steps 5 and 25 it continues to decrease, and a pronounced oscillation has set in. Thereafter Davg increases monotonically with a superimposed oscillation. Through about Step 70 the oscillation has a large amplitude, followed by a very rapid increase until about Step 75 where the curve begins to plateau. After that point, Davg exhibits several plateaus of varying duration, all accompanied by a smaller amplitude oscillation. The oscillation near Step 160, for example, is between values of 0.0436 and 0.0443. The curve in Figure 3 is plotted for 160 time steps because that range highlights the step-wise oscillation. This peculiar feature, which has been observed in many CFO Davg curves, seems to be a manifestation of the same gravitational trapping mechanism that may be experienced
4
Reproduction of Schweickart’s Figure 2
30D Schwefel Problem 2.26
Another example of how well CFO performs against a highly multimodal function is the 30-dimensional Schwefel Problem 2.26 defined by r f ( x) =
30
∑ x sin i
xi , − 500 ≤ xi ≤ 500.
i =1
The Schwefel has a known global maximum value of 12,569.5 at xi = 420.9687, i = 1,..30. The plot in Figure 5 illustrates its complexity in two dimensions.
Central force optimisation: a new gradient-like metaheuristic for multidimensional search and optimisation Figure 5
2D Schwefel Problem 2.26 (see online version for colours)
another study, a group of four different versions of PSO returned mean values between 6,021.732 (σ = ±888.506) and 6,759.77 (σ = ±615.6246) [Cui et al., 2006, from data in Table 4 adjusted according to equation (31) in that paper for direct comparison here]. Locating the Schwefel’s global maximum evidently is a severe test for an optimiser, and CFO acquits itself quite well against this benchmark. Table 2 Step # 0
A long CFO run was made with the following parameters: Nt = 2500, Np = 240, G = 2, α = 2, β = 2, Δt = 1, Frep = 0.5. The initial acceleration was set to zero with 8 probes distributed uniformly along each of the 30 coordinate axes (Nd = 30, Np = 240). As with the Griewank, the on-axis initial probe distribution avoids biasing CFO toward the global maximum because the maximum is located on the decision space principal diagonal (in this case it is not necessary to offset the maximum as with the Griewank). Figure 6 plots the evolution of the Schwefel’s best fitness, while Table 2 lists the fitness values at time steps where they change. CFO’s performance against this function is quite remarkable. In only eight time steps with a total of only 2,160 function evaluations, the fitness climbed to 12,569.0946054, compared to its known value of 12,569.5. As the plot and table show, the fitness saturated at Step 8, not increasing even slightly through the end of the run at Step 2,500. This example clearly demonstrates how fast CFO converges with the correct set of run parameters. Figure 6
30D Schwefel best fitness vs. time step (see online version for colours)
By comparison, in one study (Hamdan, 2008) the result obtained using PSO with a hybrid topology, 50 particles and 1,000 iterations averaged over 30 runs (Neval = 1,500,000) was a best fitness of 8,914.60547, σ = ±725.81543. In
221
30D Schwefel 2.26 fitness Fitness
Neval
187.9161759
240
1
187.9161759
480
2
5,014.1076680
720
4
5,100.7660296
1,200
5
5,401.0407035
1,440
6
11,616.1768812
1,680
7
12,567.8511841
1,920
8*
12,569.0946054
2,160
2,500
12,569.0946054
600,240
The Schwefel Problem 2.26 also is another example of what again appears to be ‘gravitational trapping’ as evidenced by oscillation in the Schwefel’s Davg curve (Figure 7). But here the trapping is, in fact, at the global maximum, not at a local maximum. Whether or not this simply is fortuitous is not known. The oscillation follows erratic changes in Davg as the fitness increases through Step 8. Once the oscillation sets in, it persists for the remainder of the run through Step 2500 with the repeating sequence of values 0.1401, 0.1591, 0.1695, 0.1756. Note that the abscissa in Figure 7 is limited to 100 in order to clearly show the oscillation. Figure 7
30D Schwefel Davg vs. time step (see online version for colours)
The Schwefel and the Griewank, as well as other benchmark functions reported elsewhere (Formato, 2006, 2007, 2008), convincingly support the view that oscillation in CFO’s Davg curve is a sufficient, but not necessary, condition for local trapping. If trapping fortuitously occurs at the global
222
R.A. Formato
maximum, then CFO has met its objective and the algorithm can be terminated. If, on the other hand, trapping occurs at a local maximum, then some methodology must be developed to move CFO away from the local maximum in order to continue searching the decision space. How best to accomplish this is a very important open question.
5
maximum. The closest initial probes actually are quite far away as a fraction of the decision space’s size. Figure 9
2D Goldstein-Price initial probe distribution
2D Goldstein-Price function
As a general proposition, CFO seems to perform better on highly multimodal, ‘lumpy’ functions than on ‘smoother’ ones. An example of such a function is the two-dimensional Goldstein-Price function defined by Cui et al. (2006)
f ( x1 , x2 ) = −[1 + ( x1 + x2 + 1) 2 ⋅ (19 − 14 x1 + 3x12 − 14 x2 + 6 x1 x2 + 3x22 )] × [30 + (2 x1 − 3x2 )2 ⋅ (18 − 32 x1 + 12 x12 + 48 x2 − 36 x1 x2 + 27 x22 )] –100 ≤ x1 ,x2 ≤ 100. This function has far fewer local maxima than, for example, the Griewank or the Schwefel. Its global maximum value is –3 at the point (0,–1). It is plotted in Figure 8. While the Goldstein-Price function is smoother than the previous highly multimodal examples, its ordinate varies over a tremendous range, nearly 19 orders of magnitude. Figure 8
2D Goldstein-Price function (see online version for colours)
Two CFO runs were made with the following parameters: Np = 16, G = 2, α = 2, β = 2, Δt = 1, Frep = 0.97 and Nt = 50 for a short run, Nt = 2,500 for a long run. The repositioning factor (see Appendix A.3) has been increased from 0.5 in the two previous examples to 0.97 (determined empirically as are all other CFO parameters). As before, the initial probe acceleration was set to zero. But, unlike the previous runs that deployed initial probes on the decision space axes, in this case a uniform grid of 16 probes was used that spans the decision space (Nd = 2, Np = 16). The initial probe distribution in the x1-x2 plane is shown in Figure 9. Note that, because the global maximum is very close to the origin, placing the four nearest initial probes at the points (±33.333..., ±33.333…) does not bias CFO toward the
x1
Another change is that in this case CFO was implemented with ‘adaptive truncation’ of the decision space size in order to illustrate how its performance can be improved against smoother functions like the Goldstein-Price. For simplicity, shrinking the decision space was hardwired into CFO at a predetermined time step. In a more robust implementation, changing the decision space size would be done in response to some built-in criterion, for example, saturation of the best fitness. However, because the primary purpose of this paper is to introduce the CFO concept by way of examples, the simpler approach was used. Table 3 lists the best fitness values at time steps when they changed. The asterisk at Step 21 marks the transition from the initial decision space size of –100 ≤ x1,x2 ≤ 100 to a smaller space –5 ≤ x1,x2 ≤ 5. The data at Step 21 and thereafter reflect the smaller decision space. The new dimensions and the transition point were determined empirically. Table 3 shows the tremendous range of function values in the Goldstein-Price and also how quickly CFO can converge using an adaptive strategy. The best fitness returned by the initial probe distribution is –4.482…× 1010. With only 400 function evaluations, the fitness has increased to –851.3940625 at Step 24. And for the long run the fitness saturated at –3.0153226 at Step 384. By comparison, the fast PSO algorithms in Cui et al. (2006) did locate the maximum more accurately, but CFO’s convergence is very rapid with very good fractional accuracy of 5.1075 × 10–3.
Central force optimisation: a new gradient-like metaheuristic for multidimensional search and optimisation Table 3 Step #
2D Goldstein-Price best fitness Fitness
Neval
0
–44824815414.8148157
16
1
–44824815414.8148157
32
2
–28781006822.2222083
48
3
–26761830525.9259127
64
4
–16405640162.2293156
80
5
–15086399642.3596870
96
6
–8711814424.5737621
112
7
–7895509108.6739887
128
8
–882647295.2130282
144
9
–41864166.3044985
160
10
–4130009.0511480
176
11
–5103.4628243
192
21*
–4867.6831176
352
22
–2663.8926925
368
23
–1772.3271822
384
24
–851.3940625
400
26
–90.9939622
432
29
–37.4393365
480
37
–36.5034552
608
39
–35.7466280
640
40
–26.8660199
656
41
–24.9221172
672
42
–10.4047481
688
43
–8.5153116
704
44
–4.3660577
720
46
–3.0679795
752
206
–3.0334659
3,312
384*
–3.0153226
6,160
2500
–3.0153226
40,000
The final probe positions for the short and long runs are plotted in Figures 10(a) and 10(b), respectively. After 50 steps the probes show significant clustering near the global maximum, but most of the 16 probes have not converged on it. By Step 2,500 the situation is quite different with all probes coalesced on the maximum, at least to the degree that visual inspection permits. The returned best fitnesses and corresponding coordinates are listed in Tables 4 and 5. The improvement between Steps 50 and 2,500 is not great, with the fractional accuracy decreasing from 2.266 × 10–2 to 5.1075 × 10–3. What does change dramatically is the clustering of probes as reflected in the range of best fitness values. At Step 50, the fifth best fitness is –6.62929 × 108, whereas at Step 2,500 the corresponding value is –14.7996, a very substantial change indeed.
223
Figure 10 (a) Goldstein-Price probes at Step 50 (b) GoldsteinPrice probes at Step 2,500 (see online version for colours)
(a) x1
(b) x1 Table 4
2D Goldstein-Price best fitnesses at Step 50 x1
x2
–3.06798
Fitness
–4.27187E-3
–0.988775
–4.21183
–6.88306E-2
–1.02886
–6.39578E+8
8.38114
–4.20012
–6.57804E+8
7.85619
–3.01768
–6.62929E+8
7.85619
–3.00285
–3.06392E+9
7.85619
7.85619
224
R.A. Formato
Table 5
2D Goldstein-Price best fitnesses at Step 2,500 x1
x2
–3.01532
–2.02196E-3
–1.00524
–4.6618
–7.61421E-2
–0.994774
–4.77665
–7.61421E-2
–1.04768
–4.85136
–7.61421E-2
–1.05033
–14.7996
7.61421E-2
–1.11227
Fitness
Figure 12 2D Goldstein-Price Davg curve, 50 steps (see online version for colours)
Figure 11 plots the evolution of the Goldstein-Price’s best fitness for the first 50 time steps. Because of the very large range of fitness values, saturation appears to take place at Step 9, but, as the tabulated data show, saturation actually begins around Step 46 with improvement through Step 384. Figure 11 2D Goldstein-Price best fitness vs. time step (see online version for colours)
Figure 13 Goldstein-Price Davg curve, 2,500 steps (see online version for colours)
Figures 12 and 13, respectively, plot the Davg curves for the short and long runs. During the first 50 steps the average probe distance decreases smoothly and monotonically. The curve is quite different in appearance from the Griewank’s and Schwefel’s, both of which exhibit significant oscillation. Even the 2500 step curve shows no sign of oscillation. Davg decreases very quickly and smoothly and appears to fluctuate randomly, noise-like, just above zero, which, of course, corresponds to complete coalescence of all probes. Careful inspection of the tabulated plot data reveals no oscillatory pattern. The smooth nature of the GoldsteinPrice function, along with its relatively fewer local maxima, apparently discourage gravitational trapping of CFO’s probes, whereas the very highly multimodal nature of the Griewank and Schwefel make it far easier for probes to become trapped. Of course, this interpretation of the Davg curves’ significance is speculative, but it does seem plausible in view of CFO’s gravitational metaphor.
6
CFO as a topology mapper
CFO exhibits what appears to be a unique characteristic that may be useful in conjunction with any optimisation algorithm, not only CFO, namely, its ability to distribute probes over multiple maxima. This attribute may make CFO useful as a ‘topology mapper’ that helps locate distributed maxima before a decision space is searched. Any optimiser’s efficiency and accuracy should improve if regions containing maxima can be identified before the search begins.
Central force optimisation: a new gradient-like metaheuristic for multidimensional search and optimisation In a decision space containing an uncountable number of global maxima, most optimisation algorithms converge on only one maximum. In some applications, however, the maximum that is found may not actually be the ‘best’ one. In real-world problems, such as engineering design, identical fitness values do not necessarily reflect fungible designs. As a practical matter, of many solutions with the same fitness one usually is better than the others for reasons that may not be quantifiable in an objective function. In cases where there are many indistinguishable or nearly indistinguishable maxima, CFO may be useful in exploring the decision space as a pre-processor that aids the optimiser in locating many maxima, not just one. An example of CFO’s behaviour in this regard is provided by the two-dimensional Gaussian grid function defined by ⎧ ⎪ f ( x1 , x2 ) = MIN ⎨1, ⎪ ⎩
⎛ ⎜ − ⎜e i =1 ⎜ ⎝ 3
∑
⎡⎣ x1 + 50( i − 2 ) ⎤⎦
σ
2
2
⎛ 2π x 2 + x 2 1 1 2 sin ⎜ ⎜ 40 10 ⎝ −100 ≤ x1 , x2 ≤ 100, σ = 2.
+e
⎡ x2 + 50( i − 2 ) ⎤⎦ − ⎣ 2
σ
2
⎞⎫ ⎟⎪ ⎟⎬ + ⎟⎪ ⎠⎭
⎞ ⎟ ⎟ ⎠
–50 ≤ x1 ≤ 50, x2 = 0 and –50 ≤ x2 ≤ 50, x1 = 0 where the probe density near the segment centre is lower than at the ends. The Gaussian grid also provides a compelling example of the validity of CFO’s gravitational metaphor. The attractive force of gravity requires positive mass and a positive gravitational constant. Because the CFO implementation employed here forces the mass to be positive (see Appendix A.1), the effect of negative gravity is readily demonstrated by setting the gravitational constant G to a negative value. Figure 16(a) shows the effect of running CFO with the same parameters except G = –2 instead of G = +2. The result is quite dramatic. Instead of clustering along the grid lines, the probes are symmetrically forced to the very edges of the decision space. The result of negative gravity also is evident in the Davg plot in Figure 16(b). The average probe distance increases monotonically from just above 0.35 at Step 1 to just below 0.405 at Step 50, clearly showing that the probes are flying away from each other instead of being attracted to the maxima. Figure 15 (a) Gaussian grid probes at Step 24 (b) Gaussian grid probes at Step 50 (see online version for colours)
This function is not a recognised benchmark and will be used only to demonstrate how CFO behaves. The grid is plotted in Figure 14. Its global maxima with a value of 1.025 lie on the grid lines x1 = 0, ± 50; x2 = 0, ± 50. CFO was run with the following parameters: Np = 2500, G = 2, α = 2, β = 2, Δt = 1, Frep = 0.97 and Nt = 50. The initial probes were distributed over the decision space in a uniform grid (not shown). The very large number of probes was used only for the purpose of visualising how CFO evolves their distribution over the space as the run progresses. Figure 14 2D Gaussian grid (see online version for colours) (a) x1
The probe distribution at Steps 24 and 50 appear in Figures 15(a) and 15(b), respectively. At Step 24 the Gaussian grid’s structure is quite evident, and by Step 50 it is fully resolved. Interestingly, the grid lines passing through the origin do not contain probes for x1,2 > 50. The likely reason is that probes on these segments were attracted to the large probe concentration near the origin. This thinning effect also seems to be occurring at Step 50 on the segments
225
(b) x1
226
R.A. Formato
Figure 16 (a) Gaussian grid probes at Step 50 with G = –2 (b) Gaussian grid Davg with G = –2 (see online version for colours)
30D case permits comparison to published results. The different values of Frep illustrate the effectiveness of this parameter in mitigating local trapping. Figure 17(a) plots the 2D step over its entire domain, while Figure 17(b) provides an expanded view in the vicinity of the maximum. The step’s value varies over a very wide range, from the maximum of zero in a region containing the point (75.123, 75.123) to a minimum below – 60,000. Note that the vertical faces in Figure 17(b) may not appear to be perpendicular to the horizontal faces because of the granularity with which the function is computed.
7.1 2D step
(a) x1
(b)
7
2D and 30D step function
The modified step is defined by Yao et al. (1999) f ( x) = −
Nd
∑ ( ⎣⎢ x − x i
o
2
+ 0.5⎦⎥ ) ,
i =1
−100 ≤ xi ≤ 100, i = 1,.., N d . The step is unimodal and highly discontinuous. Its global maximum of zero is offset from the origin to the point (x0,…,x0) on the decision space’s principal diagonal to avoid biasing CFO (for runs reported here x0 = 75.123). Four runs were made with Nd = 2, 30 and Frep = 0.5, 0.95. The 2D step is useful as another example of an ‘adaptive’ CFO algorithm, while the
The initial CFO probe distribution comprises 20 equally spaced probes on the x1 and x2 axes as shown in Figure 18 (Np = 40). Long and short runs were made with Nt = 250 and Nt = 4, both with α = 2, β = 2, G = 2, Δt = 1 and Frep = 0.5. Figure 19 shows that all but two of the initial on-axis probes have converged at Step 250. The reason for making the shorter CFO run is to illustrate that many probes have clustered near the maximum as early as Step 4 (Figure 20). CFO misses the maximum of zero, but it returns the same best fitness of –1 at Step 4 as it does at Step 250, the only difference being slightly different coordinates, (75, 76.3158) versus (74.315, 75.4982). The Davg plot in Figure 21 is a beautiful example of the oscillation that seems to signify gravitational trapping, which appears to be why CFO missed the maximum. The minimum value of Davg occurs at Step 3, the step before the fitness jumps to –1. Thereafter it oscillates about values that plateau and then increase in a step-like fashion as seen before. In fact, a very long CFO run (Nt = 30,000, results not shown) confirms that Davg remains flat while oscillating around the value of 0.728 seen in Figure 21. These results are significant because, as in the Goldstein-Price case, CFO’s rapid convergence on an approximation of the global maximum lends itself to an adaptive implementation. CFO may converge very quickly if the decision space shrinks around the approximate maximum located after only a few steps. In this case, for example, running CFO with the decision space truncated to 70 ≤ x1,x2 ≤ 95 with 40 probes distributed uniformly along x1,x2 = 82.5 (dividing the region into four quadrants), CFO locates the global maximum of zero at (75.338, 75.5149) in three steps. If this procedure were implemented, say, after Step 4 where trapping set in, this specific adaptation (all other CFO parameters unchanged, especially Frep) would result in locating the step’s global maximum in only seven steps (320 function evaluations). Adaptation thus may be a very useful technique for developing robust CFO implementations. Another way to influence CFO’s performance is by taking advantage of how Frep affects convergence. The previous run was made with Frep = 0.5 and CFO missed the maximum. Increasing Frep to 0.95 produces a much different result (all other parameters unchanged except Nt = 100). The best fitness and Davg are plotted in Figures 22 and 23,
Central force optimisation: a new gradient-like metaheuristic for multidimensional search and optimisation respectively. The actual global maximum of zero was located at Step 62 at (75.4894, 74.9339), and Davg shows no sign of oscillation, at least through Step 100. Thus, simply adjusting Frep resulted in CFO’s capturing the global maximum with only 2520 function evaluations.
227
Figure 19 2D step probes at Step 250, Frep = 0.5 (see online version for colours)
Figure 17 (a) 2D step function on test domain (b) expanded view of 2D step in vicinity of maximum (see online version for colours)
x1 (a)
Figure 20 2D step probes at Step 4, Frep = 0.5 (see online version for colours)
(b) Figure 18 2D step on-axis initial probe distribution (see online version for colours)
x1 Figure 21 2D step Davg curve, Frep = 0.5 (see online version for colours)
x1
228
R.A. Formato
Figure 22 2D step best fitness, Frep = 0.95 (see online version for colours)
would work as well in 30-dimensions it did in two. Figures 24 and 25, respectively, plot the evolution of best fitness and Davg. The fitness has converged to –1 by Step 4, while Davg exhibits large fluctuations, much as it did for the Schwefel Problem 2.26. Davg from a longer run (Nt = 100) is plotted in Figure 26. By Step 7 the curve settles into the oscillatory behaviour seen in other cases apparently signalling a stable, gravitationally trapped probe distribution. As in the 2D case, trapping caused CFO to miss the maximum in 30-dimensions as well. Figure 24 30D step best fitness, Frep = 0.5 (see online version for colours)
Figure 23 2D step Davg, Frep = 0.95 (see online version for colours)
Figure 25 30D step Davg, Frep = 0.5, Nt = 4 (see online version for colours)
7.2 30D step CFO also was tested against the offset 30D step function (Nd = 30, x0 = 75.123). Global maxima of zero are contained in a region around xi = 75.123, i = 1,…,30 on the decision space’s principal diagonal. The first run was made with α = 2, β = 2, G = 2, Δt = 1, Frep = 0.5 and 20 probes uniformly distributed on each of the decision space’s 30 axes (Np = 600). In this case, as in the 2D case, CFO missed the actual maximum. Its best fitness is –1 with 29 coordinates of 75, but the 30th coordinate is 73.6842. The error in the last coordinate is what causes CFO to miss the maximum. Of course, placing the initial probes as far away as possible from the principal diagonal does make for a more challenging search, but the fact remains that CFO missed the target. On the other hand, CFO got close to the maximum very quickly, in only four steps with 3,000 function evaluations. Thus, as in the 2D case, CFO’s convergence on a good ‘approximation’ to the maximum is very rapid, suggesting that perhaps an adaptive approach
Central force optimisation: a new gradient-like metaheuristic for multidimensional search and optimisation Figure 26 30D step Davg, Frep = 0.5, Nt = 100 (see online version for colours)
Figure 27 30D step best fitness, Frep = 0.95 (see online version for colours)
The second CFO run was made with the same parameters except Frep = 0.95, Nt = 250 (CFO was not run adaptively). The best fitness and Davg curves for this case appear in Figures 27 and 28, respectively, and the results are quite different than for Frep = 0.5. The best fitness increases quickly and smoothly to the actual maximum of zero at Step 38 (23,400 function evaluations) at the point (xi = 74.8472, i = 1,..,29, x30 = 74.7553). Table 6 lists the best fitness values, which increase at each step through 38. Figure 28 shows Davg decreasing erratically through Step 37 followed by a sharp jump from 0.0139 at Step 37 to 0.0545 at Step 40. The average distance fluctuates erratically through about Step 100 where a clear oscillation plateau develops through Step 157. After that the oscillation persists with an average value that smoothly increases through the end of the run at Step 250. It appears that trapping has set in, but this time it is the best probe that is trapped at the global maximum. By comparison, the FEP algorithm in Yao et al. (1999) also located the 30D step’s maximum of zero (without offset) with a standard deviation of zero averaged over 50 runs using a population of 100 for 1,500 generations. Thus, FEP achieved the same accuracy as CFO, but with 150,000 function evaluations. However, the CEP algorithm using the same experimental setup returned a mean best value of –577.76 with a standard deviation of 1125.76 [in this paper’s notation, –f(x), for maximisation]. CFO thus outperformed both FEP and CEP by wide margins. This example demonstrates how important the repositioning scheme is in CFO and how much it can influence the algorithm’s performance. Table 6 Step
Figure 28 30D step Davg, Frep = 0.95 (see online version for colours)
229
30D step function best fitness Fitness
Step
Fitness
0
–163,141
20
–13273
1
–163,141
21
–12041
2
–140,382
22
–9880
3
–122,541
23
–7953
4
–109,509
24
–6750
5
–97,246
25
–5909
6
–84,710
26
–4432
7
–78,584
27
–3156
8
–69,417
28
–2430
9
–60,574
29
–1937
10
–50,513
30
–1485
11
–48,078
31
–1093
12
–43,397
32
–761
13
–37,374
33
–480
14
–32,806
34
–277
15
–29,094
35
–125
16
–25,230
36
–33
17
–21,982
37
–30
18
–19,025
38*
0
19
–16,302
250
0
230
8
R.A. Formato
30D Rastrigin
Another good example of CFO’s ability to converge rapidly in highly multimodal, high dimensionality decision spaces is provided by the 30D Rastrigin function defined by Yao et al. (1999) 30
f ( x) =
∑ ⎡⎣ x i =1
2 i
− 10 cos {2π ( xi − xo )} + 10 ⎤⎦
–5.12 ≤ xi ≤ 5.12, i = 1,..,30, x0 = 1.123. The global maximum is zero at the point xi = 1.123 on the 30D space’s principal diagonal (as usual, offset included to avoid biasing CFO by its initial probe distribution). The Rastrigin’s complexity is illustrated by the 2D version plotted in Figure 29. CFO was run for Nt = 250 steps with α = 2, β = 2, G = 2, Δt = 1 and Frep = 0.985 using six probes on each of the decision space’s axes (Np = 180). The algorithm returned a best fitness of –0.0135014 at Step 210 (37,980 function evaluations) with coordinates between 1.12406 and 1.12463. Table 7 lists the best fitness values at time steps when they changed. A long run of 5,000 steps confirmed that the fitness had saturated at Step 210 as shown in the table. Figure 29 2D Rastrigin function (see online version for colours)
Figure 30 plots the best fitness evolution, and Figure 31 plots Davg. The average distance curve exhibits no sign of oscillation and a very long run confirms that none sets in by Step 5000. Thus it appears that CFO has fully converged without trapping. Its best fitness compares very favourably with other reported results. In Yao et al. (1999, Table III) for example, the FEP algorithm located a maximum of –0.046 [in this paper’s notation, –f(x) for maximisation] with standard deviation 0.012. This level of accuracy required 25,000,000 function evaluations (population of 100 evolved for 5,000 generations averaged over 50 runs), compared to CFO’s 37,980 function evaluations that produces a reproducible result that is more accurate by a factor of 3.4.
Table 7 Step #
30D Rastrigin best fitness Fitness
Neval
0
–120.8340203
180
1
–120.8340203
360
2
–48.6844266
540
7
–45.4314695
1,440
8
–36.3784988
1,620
10
–34.4537187
1,800
17
–18.6338892
3,240
35
–13.5287508
6,480
36
–7.0517136
6,660
37
–2.6744273
6,840
40
–1.0735180
7,380
92
–0.6128153
16,740
95
–0.2164259
17,280
203
–0.0305674
36,720
204
–0.0272906
36,900
205
–0.0136559
37,080
210*
–0.0135014
37,980
5000
–0.0135014
900,180
Figure 30 30D Rastrigin best fitness, Frep = 0.985 (see online version for colours)
Central force optimisation: a new gradient-like metaheuristic for multidimensional search and optimisation Figure 31 30D Rastrigin Davg, Frep = 0.985 (see online version for colours)
CFO’s performance against the 30D Rastrigin is very robust, but it depends heavily on the run parameters. For example, if Frep is decreased ever so slightly, from Frep = 0.985 to Frep = 0.98, then the best fitness decreases from –0.0135 to –0.268. The CFO implementation is thus very sensitive to Frep, so that choosing this parameter is a key factor in how well CFO works. Note, too, that the initial probe configuration and number of probes are also very important in determining how well a run performs because they combine to provide CFO with its initial information about the decision space topology.
9
Conclusions and future work
This paper introduces CFO as a new, nature-inspired search and optimisation metaheuristic invoking the metaphor of gravitational kinematics. Preliminary analysis suggests that CFO is an effective deterministic search algorithm for solving constrained multidimensional optimisation problems in highly multimodal, unimodal, smooth, or discontinuous decision spaces. The paper discusses several aspects of the CFO concept as follows: 1
2
It provides examples of how well CFO works by running a simple implementation against several widely used test functions in 2- and 30-dimensions. Because CFO is deterministic, every run yields the same results, so that the data presented here can serve as validation cases for researchers developing their own CFO algorithms. As to CFO’s effectiveness, the results speak for themselves and are quite encouraging. The author believes they justify further work on CFO. This paper highlights the importance of two parameters in particular in setting up a CFO run, Np and Frep. Numerical experiments and the examples discussed here suggest that these are the two most important CFO parameters by far, and that the algorithm can be run
231
successfully by appropriately assigning values to these two variables while defaulting the others to α = 2, β = 2, G = 2, Δt = 1 and zero initial acceleration in most cases. Of course, Frep applies only to the CFO implementation described here in which errant probes are repositioned according to the pseudocode in Appendix A.3. If a different errant probe retrieval scheme is used (for example, a ‘reflecting’ decision space boundary), then Frep will not exist as a run parameter, but it likely will be replaced by a new parameter that may or may not be as critical. The reason Np is so important is that it determines how much CFO ‘knows’ about the decision space topology at the beginning of a run. For example, if the initial probe distribution samples the objective function only at points where it is zero, then CFO has no information at all, and all subsequent accelerations will be zero. The probe distribution never will change and the algorithm becomes stuck. Depending on how the initial probes are deployed, for example, on-axis, on-diagonal, in a grid and so on, the number of probes then determines where the objective function is sampled and consequently how much information CFO has to start. Thus, the combination of Np and the initial probe configuration are key factors in how well CFO performs. 3
If indeed Np and Frep are CFO’s most important parameters, then assigning their values might be done automatically using a feedback mechanism. This approach is particularly attractive because it eliminates the user as a de facto component of the algorithm. The question of how the user interacts with an algorithm, any algorithm, not only CFO, is fundamental to how the algorithm performs. A user learns by using an algorithm, and the knowledge and insight gained are applied in various ways to choose effective run parameters. Because each user develops his own unique body of knowledge, the very process of setting up an optimisation run may be more art than science. This is particularly true for CFO in view of its inchoate nature and sensitivity to parameter values. Fortunately, there may be a methodical approach that addresses this problem and that happens to be especially appropriate to CFO. Battiti and Brunato (2007) suggest the use of ‘reactive search’, a sub-symbolic machine learning methodology that changes run parameters in real time. CFO appears to be an ideal candidate for this type of feedback-based parameter tuning because it searches quickly and deterministically. For example, there is no doubt that the correct value of Frep is crucial to good results, but there is no obvious way to assign a value. One approach might be a bisection type calculation in which an arbitrary starting value in the range 0 ≤ Frep ≤ 1 is changed in response to, say, saturation of the best fitness for some number of time steps, or to the onset of oscillation indicating
232
R.A. Formato trapping at local maxima. Coupling a bisection calculation, which is very fast, with CFO’s usually rapid convergence might lead to a very robust reactive search implementation. There no doubt are many possible feedback mechanisms that would be effective in setting CFO’s run parameters, the bisection approach being only one idea.
4
5
6
The author believes that this paper provides further insight into the nature of the CFO metaphor by pointing out the remarkable similarity between the structure of many CFO average distance curves and the velocity increment needed to avoid gravitational trapping by NEOs such as the asteroid Apophis. Oscillatory behaviour of the best fitness curve appears to be a sufficient condition for trapping at a local maximum. Because CFO is deterministic, local trapping is expected. When trapping sets in, it appears to be the same effect as NEO trapping because the Davg and NEO Δv curves look so much alike. While this conclusion is speculative, it seems unlikely that their striking visual similarity is entirely accidental, especially in view of the fact that NEO interaction is a result of real gravity and CFO search uses gravity as its metaphor. It may well be that much of NEO gravitational theory is applicable to CFO directly or with modification or extension. This paper raises the question of what CFO really is by describing it as a ‘gradient-like’ metaheuristic. With the values α = 2, β = 2, for example, and the difference of fitnesses in the definition of CFO ‘mass’ used here, the ratio of mass to the distance between probes looks very much like the square of a directional derivative in the decision space. Of course, it is not one because of the unit step function needed to keep the ‘mass’ positive. But it certainly is similar to a derivative, and, in that sense, CFO may be thought of as a gradient-like deterministic optimiser. This characterisation may be useful as a catalyst for the development of a multidimensional ‘generalised gradient’ as a new mathematical construct. This expanded concept of derivative might include the unit step as an integral part, for example, in order to implement effective CFO searches. The author leaves the question of whether or not this suggestion makes any sense to the mathematicians who hopefully will take an interest in CFO. Of all the questions that CFO raises, how to choose run parameters arguably looms largest. A fast, reliable and repeatable methodology for specifying run parameters is essential to its becoming a truly useful algorithm, but this has been elusive. Some parameter choices are clearly better than others, leading to faster or more accurate results. Indeed, as seen in the examples, changing the repositioning factor Frep even slightly can make the difference between CFO’s converging on global maxima or missing them entirely. But there is no obvious way of assigning parameter values. For the
runs reported here, all parameters were chosen empirically, that is, on a trial-and-error basis. The values used provide results good enough to demonstrate CFO’s viability as a search and optimisation metaheuristic, but nothing more. It is essential that the trial-and-error approach be replaced with one resting on a sound analytical foundation. 7
There are many other unresolved questions relating to CFO algorithms generally. In the author’s opinion, the following issues are important ones that merit consideration: •
Theoretical refinements: This paper presents CFO as a conceptual framework, which at this time lacks a deep theoretical foundation. CFO clearly works well on an empirical basis, but there is no formal proof of convergence. Researchers far better qualified than the author may find this a fruitful area, and it is the author’s hope that this paper will generate the requisite interest.
•
Definition of CFO ‘mass’: The definition of CFO mass in this paper combines the difference of fitness values with the unit step to avoid negative mass. This particular definition happens to be consistent with a ‘gradient-like’ interpretation of CFO’s nature as discussed above. But there are many other possible definitions of ‘mass’, some perhaps fitting with the gradient-like view of CFO and others not. The best definition of CFO mass, if there is an actual ‘best’ definition, is unknown. Perhaps different mass definitions for different types of problems or decision spaces would improve performance dramatically. Exactly what CFO ‘mass’ should be is an open question.
•
CFO exponents: α and β could be hardwired at a value of 2 as discussed above, but doing so may not be optimal. For example, time-varying exponents that in effect increase the gravitational attraction as the algorithm progresses might be effective in clustering probes at the maxima more quickly. Different exponent values in different regions of the decision space may provide improved performance by, for example, increasing the gravity in the vicinity of identified maxima. These are only two of many possibilities that could be considered in adaptively specifying α and β , possibly in a ‘reactive search’ implementation.
Central force optimisation: a new gradient-like metaheuristic for multidimensional search and optimisation •
Gravitational constant: G is a very important CFO parameter, but there is no obvious way to assign its value. Because G = 2 for the runs reported here, there has been no discussion of the effect of changing G (except that negative values work very poorly!). Other results, however, show that some values clearly provide better convergence than others (Formato, 2006; 2007). Perhaps a time-varying gravitational constant would improve CFO’s performance. Or perhaps one that varies with location in the decision space, say, by increasing near identified maxima would be better. These and similar questions are unanswered.
•
Local trapping: There is no doubt that CFO can become trapped in local maxima (not at all surprising because CFO is deterministic). Escaping local maxima when trapping occurs is an important issue, but how best to do it is an open question. As discussed above, deterministic repositioning using Frep may be one answer, but that applies only to the repositioning scheme that relies on Frep. It is only one of many possible approaches. Stochastic optimisers escape local trapping by building in randomness at the expense of efficiency and repeatability. Even though CFO provides the major advantage of being deterministic, it certainly is possible to hybridise the algorithm by injecting some measure of randomness to mitigate trapping. In this sort of approach, random probe repositioning may be used just long enough to escape trapping followed by a return to the deterministic CFO mode. This approach and others, for example, speciation (Parrott and Li, 2006), merit a closer look.
•
Probes and time steps: Several questions arise about the number of probes, Np, and the number of time steps, Nt. While as a general proposition it may seem intuitive that more probes working for a longer time should be better than fewer probes for a shorter time, this idea seems to be incorrect. Arbitrarily increasing Np and Nt does not improve CFO’s performance. Rather, these parameters must be carefully chosen, especially Np. As pointed out above, the number of probes and type of initial probe distribution determine what information is available about the decision space topology.
233
Some general comments may be useful, with the understanding that in the end they may not be correct. It seems reasonable, for example, to increase the number of probes in proportion to how multimodal the decision space is. ‘Smooth’ spaces seem to lend themselves to fewer probes, whereas highly multi-modal spaces need more. It may be possible to relate Np to some measure of the decision space’s modality (for example, the interdecile fitness range or the fitness’s standard deviation computed for a set of ‘sampling probes’ that characterise the decision space). Whether or not a run should be terminated by specifying Nt or by some other mechanism is an important issue. The examples show that CFO often converges very quickly, so that unnecessarily long runs waste resources. But, of course, there always remain the questions, what if the program ran for, say, 50 more steps? How would the results change? As probe position plots show, in some cases convergence is evident in a very small number of steps. But in others even large values of Nt may not reveal a visually apparent convergence even though the global maxima actually have been located. •
Termination criteria: If a CFO run is not made for a predetermined number of time steps, then some termination criterion must be specified. What constitutes effective termination criteria is an important unanswered question. There are many possibilities, among them: a fairly static probe distribution over several time steps b saturation of the best fitness over many time steps c clustering of fitness values in some fraction of the total range (for example, upper decile, if the total range is not too narrow) d narrowing of the fitness range to some sufficiently small interval. There are many other possibilities, those listed being representative, not exhaustive.
234
R.A. Formato •
•
Optimising the optimiser: In the author’s opinion, the examples in this and the other CFO papers (Formato, 2006, 2007, 2008) suggest convincingly that the CFO metaphor has merit. CFO works, and with properly chosen run parameters, it works quite well. CFO can converge to the vicinity of a global maximum very quickly, often requiring far fewer function evaluations than other algorithms. Because CFO runs quickly, it may be possible to determine an optimum set of CFO run parameters by using reactive search or possibly even another optimiser as a pre-processor. In this sort of implementation the objective function would be imbedded in a CFO algorithm that is imbedded in another CFO (or other optimiser) algorithm. While this approach perhaps is unconventional and somewhat brute force, it nevertheless may merit consideration because CFO can converge very quickly under appropriate circumstances. Other strategies: Depending on the goal of the CFO search, different strategies and implementations may be desirable. For example, if the goal is to find as many maxima as possible, both local and global, then excluding regions around maxima that have already been located presumably will permit CFO to locate other groups of good fitnesses (a ‘swiss cheese’ implementation). Groups of solutions often are valuable in real-world problems because of the trade-offs that usually are involved. In the case of discontinuous objective functions, applying CFO iteratively by excluding regions where there are clusters of equal-valued global maxima, for example, might result in better performance in locating other clusters of local maxima. A speciesbased CFO implementation, perhaps similar to SPSO (Parrott and Li, 2006), also may provide good results.
•
The CFO implementation in this paper computes a gravitational acceleration for each probe at each time step that depends on the locations of all the other probes at that time step. Perhaps including all, or some portion, of the previous probe locations would improve convergence. Doing so is straightforward, but it might result in a substantial computational penalty. Equation (A-4) would have to be changed to a double sum over the probes and also over previous time steps. Nevertheless, this modification might be useful for runs where Np and Nt are not too large.
•
Partitioning the decision space or normalising it to a fixed size, say, 0 ≤ xi ≤ 1, i = 1,…,Nd,, may improve performance. Similarly, normalising the fitnesses at each time step might improve performance. Limiting the magnitude of the acceleration relative to, say, probe location within the decision space might result in better
performance by preventing ‘over acceleration’ of the probes that causes them either to fly out of the space or over nearby solutions. And the question of how best to deal with errant probes raises many questions. Should the decision space boundary be ‘reflecting’ or ‘absorbing’? Or should the errant probe be reinserted somewhere in the decision space and if so, how? All of these questions remain unresolved. CFO is in its infancy as a search and optimisation algorithm and there remain many important unresolved issues concerning how CFO algorithms should be implemented. The author hopes that this paper will inspire further work on CFO that addresses these and other questions so that CFO reaches its full potential as an effective optimisation metaheuristic.
Acknowledgements The author wishes to express his sincere appreciation to Professor Zhihua Cui, IJBIC’s editor-in-chief, for providing him with the opportunity to introduce CFO to the optimisation community as a novel and largely untested metaheuristic. If CFO realises what hopefully is its high potential, it will be in large measure due to Dr. Cui’s foresight. If, on the other hand, CFO fails to meet this expectation, the responsibility is entirely the author’s. The author also wishes to thank Professor Mario Pavone of the University of Catania, Sicily, Italy, for giving him the opportunity to introduce CFO at NICSO 2007.
References Battiti, R. and Brunato, M. (2007) ‘Reactive search: machine learning for memory-based heuristics’, Ch. 21 in Gonzalez, T.F. (Ed.): Approximation Algorithms and Metaheuristics, Taylor & Francis Books (CRC Press), Washington, DC, available at http://rtm.science.unitn.it/~battiti/archive/chap21.pdf. Brand, L. (1966) Differential and Difference Equations, John Wiley & Sons, Inc., New York, NY. Cui, Z., Zeng, J. and Sun, G. (2006) ‘A fast particle swarm optimization’, Int’l. J. Innovative Computing, Information and Control, December, Vol. 2, No. 6, pp.1365–1380. Dorigo, M., Maniezzo,V. and Colorni, A. (1991) Positive Feedback as a Search Strategy, Dipartimento di Elettronica, Politecnico di Milano, Italy, Tech. Rep. 91-016, available at http://iridia.ulb.ac.be/~mdorigo/pub_x_subj.html. Formato, R.A. (2006, et seq.) Copyright Reg. nos. TX 6459271, 6461552, 6468062, 6464965, 6522082, 6540042, 6603758, 6823122, US Library of Congress, Washington, DC, available at http://www.copyright.gov. Formato, R.A. (2007) ‘Central force optimization: a new metaheuristic with applications in applied electromagnetics’, Progress in Electromagnetics Research, Vol. PIER 77, pp.425–491, available at http://ceta.mit.edu/PIER/pier.php?volume=77.
Central force optimisation: a new gradient-like metaheuristic for multidimensional search and optimisation Formato, R.A. (2008) ‘Central force optimization: a new computational framework for multidimensional search and optimization’, in Krasnogor, N., Nicosia, G., Pavone, M. and Pelta, D. (Eds.): Studies in Computational Intelligence (SCI), Vol. 129, pp.221–238, Springer-Verlag Heidelberg, available at www.springerlink.com, ISBN 978-3-540-78986-4. Goldstein, H. (1965) Classical Mechanics, 7th printing, AddisonWesley Publishing Co. Inc., Boston. Hamdan, S.A. (2008) ‘Hybrid particle swarm optimizer using multi-neighborhood topologies’, INFOCOMP, J. Comp. Sci., Dept. Computer Science, Federal Univ. of Lavras (Brazil), March, Vol. 7, No. 1, pp.36–43, at http://www.dcc.ufla.br/infocomp/artigos/v7.1/vol7.1.htm. Kennedy, J. and Eberhart, R. (1995) ‘Particle swarm optimization’, Proc. IEEE Conf. On Neural Networks, November/December, Vol. 4, pp.1942–1948. Marion, J. (1970) Classical Dynamics of Particles and Systems, 2nd ed., Harcourt Brace Jovanovich, New York, NY. Parrott, D. and Li, X (2006) ‘Locating and tracking multiple dynamic optima by a particle swarm model using speciation’, IEEE Trans. Evol. Comp., August, Vol. 10, No. 4, pp.440– 458. Schweickart, R., Chapman, C., Durda, D., Hut, P., Bottke, B. and Nesvorny, D. (2006) Threat Characterization: Trajectory Dynamics (White Paper 039), arXiv:physics/0608155v1 available at http://arxiv.org/abs/physics/0608155. Valsecchi, G.B., Milani, A., Gronchi, G.F. and Chesley, S.R. (2003) ‘Resonant returns to close approaches: analytical theory’, Astronomy & Astrophysics, Vol. 408, No. 3, pp.1179–1196, DOI: 10.1051/0004-6361:20031039, available at http://www.aanda.org. Yao, X., Liu, Y. and Lin, G. (1999) ‘Evolutionary programming made faster’, IEEE Trans. Evol. Comp., July, Vol. 3, No. 2, pp.82–102.
Appendix A.1 CFO theory CFO provides a conceptual algorithmic framework for solving the following problem: In a decision space defined by ximin ≤ xi ≤ ximax, i = 1,…,Nd, the xi being ‘decision variables’, determine the locations of the global maxima of the objective function r r f ( x1 , x2 ,..., xN d ) . The value of f ( x ) at each point x is referred to as its ‘fitness’. The objective function’s topology is unknown. It may be continuous or discontinuous, highly multimodal or unimodal and possibly subject to a set of constraints Ω among the decision variables. CFO is based on the metaphor of gravitational kinematics, the branch of physics that models the behaviour of masses moving solely under the influence of the force of gravity. The gravitational field is a conservative, linear, vector force field derivable from a scalar potential function. The ‘action at a distance’ force of gravity thus may be computed directly as a vector quantity from Newton’s universal law of gravitation, or, alternatively, as the negative gradient of a scalar gravitational potential function. These distinct approaches highlight different conceptual
235
aspects of the CFO analogy as discussed in the body of this paper. In the physical universe, the magnitude of the force of gravity between two masses m1 and m2 separated by a distance r is given by Marion (1970, §2.7) F =γ
m1m2 r2
(A-1)
where γ > 0 is the ‘gravitational constant’. Gravity acts along the line connecting the masses’ centres, making it, by definition, a ‘central force’. Hence, the name central force optimisation to describe this new search and optimisation metaheuristic. Gravity always attracts masses toward each other. Unlike the Coulomb electric force, which may be attractive or repulsive depending on the signs of the charges, gravity never is repulsive. In the real world, gravity always accelerates masses toward each other. The vector acceleration experienced by mass m1 due to mass m2 is given by Newton’s second law of motion as m rˆ r a1 = −γ 22 r
(A-2)
where rˆ is a unit vector pointing towards mass m1 from mass m2 along the line joining m1 and m2. A constant acceleration applied during the time interval t to t + Δt changes a mass’s location in three-dimensional space according to (Brand, 1966, §56) r r r 1r (A-3) R (t + Δt ) = R0 + V0 Δt + a Δt 2 2 r r The mass’s position is R (t + Δt ) at time t + Δt , where R0 r and V0 , respectively, are the position and velocity vectors at time t. In a standard right-handed 3-D Cartesian coordinate r system, the position vector is given by R = xiˆ + yjˆ + zkˆ , where iˆ , ˆj , kˆ , are the unit vectors along the x, y and z
axes, respectively. Equations (A-2) and (A-3) are referred to as the ‘equations of motion’. In the CFO metaphor they will be generalised from 3 to Nd dimensions because CFO searches Nd-dimensional spaces for the extrema of an objective function to be maximised. However, for the purpose of explaining the CFO analogy, the 3-dimensional decision space in Figure A-1 will be used to provide a concrete visualisation. CFO ‘flies’ a group of probes through the decision space along trajectories determined by the generalised equations of motion at a set of discrete time steps. In this example, the position of each probe at each step is specified its position r vector R jp , where the indices p and j are the probe number and time step number, respectively. Probe p moves from r r position R jp−1 at time step j–1 to position R jp at time step j, with the ‘time’ interval between steps j–1 and j being Δt. In an Nd-dimensional decision space, the generalised position
236
R.A. Formato
r vector is R jp =
Nd
∑x
p, j k eˆk
, where the xkp , j are probe p’s
The ‘acceleration’ experienced by probe p due to probe n for the CFO implementation described here is given by
k =1
coordinates at time step j, and eˆk is the unit vector along the xk axis. The objective function to be maximised is defined on the Nd-dimensional decision space. The value of the objective function at each point along a probe’s trajectory is the ‘fitness’ at that point. At time step j–1 at probe p’s location, for example, the fitness is given by M jp−1 = f ( x1p , j −1 , x2p , j −1 ,..., xNp , j −1 ) . Every other probe also
(
(
not be confused with ‘mass’ in CFO. While larger circles in Figure A-1 do correspond to greater fitness values, CFO’s ‘mass’ is not the same as the fitness for reasons discussed below. r r How probe p moves from position R jp−1 to R jp is determined by its initial position and by the total acceleration produced by the ‘masses’ created by the userdefined function of the fitnesses at each of the other probes’ locations. Figure A-1
Typical 3-D CFO decision space
)(
G ⋅ U M sj −1 − M jp−1 ⋅ M sj −1 − M jp−1 r r β R sj−1 − R jp−1
has a fitness M kj−1 , k = 1,..., p − 1, p + 1,..., N p , associated
relative size of the circles at the tip of each position vector. The probe number is 1 ≤ p ≤ Np, while the time step number is 0 ≤ j ≤ Nt, where Nt is the total number of time steps. As time progresses the probes fly through the decision space along trajectories governed by the equations of motion. The force of gravity in CFO is created by a mass that is a user-defined function of the fitness at each of the other probes’ locations. It is important to note that the symbol M kj used for probe k’s fitness at time step j must
rn
) ⋅(R α
j −1
r − R jp−1
)
Similarly, probe p is subject to an acceleration due to probe s that is given by
d
with it, where Np is the total number of probes. In Figure A1, the fitness at each probe’s location is represented by the size of the blackened circle at the tip of the position vector, the metaphorical correspondence being to the size of a ‘planet’ in space. Ranked from largest to smallest, the r r r fitnesses in Figure A-1 are located at R sj−1 , R jp , R nj−1 and r R jp−1 , respectively, the ranking being reflected in the
)(
G ⋅ U M nj −1 − M jp−1 ⋅ M nj −1 − M jp−1 r r β R nj−1 − R jp−1
rs
) ⋅(R α
j −1
r − R jp−1
).
Following standard notation, the vertical bars denote vector 1
r ⎛ Nd ⎞2 , where a1 are the scalar magnitude A = ⎜ ∑ a i2 ⎟ ⎜ ⎟ ⎝ i =1 ⎠
r components of vector A . G is a very important CFO parameter, the ‘gravitational constant’ (G > 0) corresponding to γ in equation (A-1) [note that the minus sign in (A-2) has been absorbed into the order in which the differences in the acceleration expressions are taken]. The terms in the numerator containing the objective function fitnesses, for example,
(M
s j −1
)
α
− M jp−1 , correspond loosely to the mass in (A-2).
CFO’s acceleration expression is quite different from what it is in physical space in three important ways: a
the definition of mass
b
inclusion of the exponents α > 0, β > 0
c
inclusion of the unit step U (⋅) .
In this particular CFO implementation, the user-defined function that is CFO’s ‘mass’ begins with the difference of fitnesses, not the fitnesses values themselves. The algorithm designer is free to choose any function of the fitnesses, and different definitions may result in better performance against certain objective functions. One possibility, for example, might be a ratio of fitnesses or their differences, a notion reminiscent of the ‘reduced mass’ concept in gravitational kinematics, but only the difference of fitnesses definition will be used in this paper. The use of the fitness difference M sj−1 − M jp−1 in this CFO implementation is intended to avoid excessive gravitational ‘pull’ by other very close probes. It is likely that nearby probes in the decision space have similar fitness values, which may lead to an excessive gravitational force on the subject probe. The difference of fitnesses intuitively seems to be a better measure of how much gravitational influence there should be by the probe with a greater fitness on the probe with a smaller one. In physical space α and β take on the specific values 1 and 3, respectively [note that the CFO numerator does not contain a unit vector like (A-2)]. The variation of gravitational acceleration with mass and distance in the
Central force optimisation: a new gradient-like metaheuristic for multidimensional search and optimisation universe follows an immutable law. But in metaphorical ‘CFO space’ the algorithm designer is free to assign a completely different variation with mass and distance whose consequences can be quite dramatic. This flexibility is included in the free parameters α and β. CFO test runs reveal that the algorithm’s convergence is sensitive to the exponent values, and that some values of these exponents are better than others. The third difference is the inclusion of the unit step ⎧1, z ≥ 0 ⎫ function U ( z ) = ⎨ ⎬ . Because CFO space is ⎩0, otherwise ⎭ metaphorical, it can be a strange place in which physically unrealisable objects exist. Real mass must be positive, but not so mass in CFO space. In this CFO implementation, if the unit step were not included, then mass could be positive or negative depending on which fitness is greater in computing the fitness difference. The unit step function eliminates the possibility of ‘negative’ mass, thereby creating only positive masses. This attribute is quite important in any CFO implementation, because only positive mass is attractive in nature. If CFO is implemented with the possibility of negative mass, the corresponding accelerations are repulsive instead of attractive. The effect of a repulsive gravitational force is to fly probes away from large fitness values instead of toward them, just the opposite of what the algorithm is intended to do. The expressions above are the accelerations experienced by probe p due only to probes n and s. But probe p’s trajectory is determined by the gravitational influence of all the other probes. The total acceleration experienced by p as r r it ‘flies’ from position R jp−1 to R jp thus is given by summing over all other probes: r a jp−1 = G
Np
∑U ( M k =1 k≠ p
k j −1
)(
− M jp−1 ⋅ M kj −1 − M jp−1
r r R kj−1 − R jp−1 r r β R kj−1 − R jp−1
(
)
)
α
×
(A-4)
The new position vector for probe p at time step j therefore becomes: r r 1r R jp = R jp−1 + a jp−1Δt 2 , j ≥ 1 2
(A-5)
(A-4) is analogous to (A-2), while (A-5) is analogous to (A-3). These four equations embrace the gravitational kinematics metaphor on which CFO is based. Note that equation (5) in Formato (2007) contains a r ‘velocity’ term V jp−1 that intentionally has been omitted from (A-5) above. The reason for this change is that numerical experiments performed after that paper was published revealed that including the velocity term as written actually impeded CFO’s convergence. This behaviour is not understood at this time. In §4, p.431, of that paper the velocity term already had been set to zero as a
237
r matter of convenience. Consequently the term V jp−1 has been dropped from (A-5) in this formulation of CFO theory, which is an important change. Note, too, that the velocity 1 term, the coefficient and Δt in equation (5) of Formato 2 (2007) were retained only to preserve the analogy to gravitational kinematics. None of these variables is required (of course, Δt cannot be zero). A constant value of Δt and 1 simply should be absorbed into the the factor 2 gravitational constant G. Varying Δt, which changes the interval at which probes ‘report’ their positions, likely will influence how CFO converges, but Δt’s influence has not been investigated and consequently remains an open question. The formulation of CFO theory presented here is based on the vector model of the gravitational force field. Another approach would be to compute the force of gravity as the negative gradient of a scalar gravitational potential [see, for example, Marion, (1970), §2.7 or Goldstein, (1965), §3.6)]. While the results are the same as the vector model, the gravitational potential approach emphasises a different aspect of the CFO gravitational metaphor. Because the gradient is the derivative (slope) of the gravitational equipotential surface in the direction of maximum rate of change, conceptually CFO may be thought of as a form of deterministic gradient-based search. In the implementation described here, for example, the factor
(M
)
− M jp−1 r r β R sj−1 − R jp−1 s j −1
α
looks very much like a directional derivative (difference of fitnesses divided by distance between evaluation points). But because α and β can take on any value assigned by the user, this ratio perhaps is best described as a ‘generalised directional derivative’. Interpreting CFO as a generalised gradient-like methodology may be useful for suggesting potentially useful analytical approaches and also for understanding how CFO works. This conceptual framework may be attractive to researchers more accustomed to thinking in terms of derivatives.
A.2 CFO implementation The CFO algorithm comprises equations (A-4) and (A-5). It is simple and easily implemented in a compact computer program. The basic steps are: a
compute initial probe positions, the corresponding objective function fitnesses and assign initial accelerations
b
successively compute each probe’s new position using (A-5) based on previously computed accelerations using (A-4)
c
verify that each probe is located inside the decision space, making corrections as required
238
R.A. Formato
d
update the fitness at each new probe position
e
compute accelerations for the next time step based on the new positions
f
loop over all time steps or until some termination criterion has been met.
1 ≤ p ≤ N p , 1 ≤ i ≤ Nd
A.3.3 Time loop a
for p = 1 to N p , i = 1 to N d :
Because CFO may ‘fly’ a probe outside the decision space domain into regions of unallowable solutions, such errant probes should be returned to the decision space. While there are many possible retrieval schemes, the one used for the runs reported here is simple and deterministic as described below (see Formato, 2008, for details).
R ( p, i, j ) = R ( p, i, j − 1) +
retrieve errant probes as required
R ( p, i, j ) = ximax − Frep ⋅ ( ximax − R ( p, i, j − 1))
[probe #, coordinate #, time step #].
[Note: repositioning factor, 0 ≤ Frep ≤ 1 , see
Create fitness array M ( p, j ) = f ( R( p,i, j )).
(Formato, 2008)] b
M ( p, j ) = f ( R( p, i, j ))
a(1) uniform probes on each coordinate axis
Np
for i = 1 to N d , n = 1 to p = n+ R ( p, i, 0) =
ximin
(i − 1) N p Nd +
Np Nd
Nd
c :
Np
A( p, i, j ) = G
(
(n − 1) ximax − ximin Np
Update accelerations For p = 1 to N p , i = 1 to N d :
,
Nd
(
Update fitness matrix For p = 1 to N p :
At time step 0:
∑ U ( M ( k , j ) − M ( p, j ) ) ×
k =1 k≠ p
).
( M (k , j ) − M ( p, j ) )α ×
−1
r r where R kj − R jp =
, # probes/dimension, user-specified).
Nd
R ( k , i, j ) − R ( p, i , j ) r r β R kj − R jp
∑ ( R(k , m, j) − R( p, m, j) )
2
m =1
a(2) probes slightly off decision space diagonal For, p = 1 to N p , i = 1 to N d : R ( p, i, 0) = ximin +
(
(n − 1) ximax − ximin
) [N
a(3) other user-specified probe distribution initial acceleration A( p, i, 0) = 0, 1 ≤ p ≤ N p , 1 ≤ i ≤ N d
initial fitness M ( p, 0) = f ( R ( p, i, 0)) ,
]
d ( p − 1) + i − 1
N p Nd −1
c
2
If R ( p, i, j ) > ximax then
Create arrays R(p, i, j) [position vectors], A(p, i, j) [accelerations], 1 ≤ p ≤ N p , 1 ≤ i ≤ N d , 0 ≤ j ≤ Nt
A.3.2 Initialisation
b
1 A( p, i, j − 1)Δt 2 , Δt 2 = 1 2
R ( p, i, j ) = ximin + Frep ⋅ ( R ( p, i, j − 1) − ximin )
A.3.1 Data structures
b
1
If R ( p, i, j ) < ximin then
A.3 CFO pseudocode
a
New probe positions
.
d
Increment j → j + 1 and repeat from (3)(A) until j = Nt or some other stopping criterion has been met.