Local interaction evolution strategies for design ... - Semantic Scholar

Local Interaction Evolution Strategies for Design Optimization Martina Gorges-Schleuter, Ingo Sieber, Wilfried Jakob Forschungszentrum Karlsruhe, Institute for Applied Computer Science P.O. Box 3640, D-76021 Karlsruhe, Germany gorges @iai.fzk.de Abstract- This paper studies and describes two classes of local interaction Evolution Strategies (LES) where the choice of parents for mating as well as the decision for survival and replacement are performed locally only. The entire population is now considered to consist of spatially distributed individuals each belonging to its own overlapping neighborhood and selection operates on a set of local pools. The specific real-world application considered is the optimizationof a 2-lens-systembeing part of a heterodyne receiver, a microoptical communication module. The goal is to optimize the system to be as insensitive to fabrication tolerances as possible while still maintainingoptimal properties. This task is highly multi-modaland multi-objective too. The evaluationof each design variant is based on a simulation model and thus is a time consuming task.

1 Introduction The production of specimen for microsystems or microcomponents is both, time and material-consuming due to the sophisticated manufacturing techniques. In a traditional design process the number of possible variations which can be considered is very limited. Consequently, the manufacturing step should be preceded by simulations; the results of which may constitute a basis for making a laboratory specimen. Measurements conducted on laboratory specimens furnish data for comparison to validate the simulation model and to learn about the microsystems behavior as well. Thus, in microsystem technology computer-based design techniques become more and more important - similar to the development of microelectronics. The computer aided development and optimization is based on simulation models. These must be sufficiently fast computable and need to be parameterizable. In addition they need to be accurate enough, as the quality of an optimization depends highly on the quality of the simulation model. For the difficult task of optimization we promote the application of evolutionary algorithms (EA). The entire population is now considered to consist of uniformly distributed individuals over a geographic region which might be linear, planar, or spatial. Interactions of individuals, especially the selection of partners for recombination and the selection of descendants for survival, are restricted to individuals geographically located nearby, the local overlapping neighborhoods. The generation transition is performed by substituting an individual ai at a geographic place i by a locally generated descendant. This model is now widely referred to as the diffusion model; the name reflects the pro0-7803-5536-9/99/$10.00 01999 IEEE

cess, by means of which information, i.e. the characters the individuals carry, may propagate over time through the population. The essential difference to the traditional EAs using global knowledge and global rules is the collective behavior of the individuals limited to local rules of interaction. Besides their favorable properties concerning parallelization, it were shown for the diffusion model Genetic Algorithms that the distributed local interacting models have a different behavior compared to their sequential counterparts. They promote global diversity and especially in those cases where we have a multi-modal, nonlinear environment frequently give better results [ l , 2,3, 111. A diffusion model ES was first introduced by Sprave [ 5 ] . His LInear Cellular Evolution strategy (LICE) uses a ring population structure and adopts the selection model of standard ES to the neighborhood model, i.e the parents of any descendant are drawn randomly from the neighborhood. Another type of diffusion model ES has been proposed by GorgesSchleuter [6] suggesting to choose the center individual of a neighborhood as one parent and to draw the other parent randomly from the neighborhood. Independently of the particular type of local selection chosen both studies showed that the highest benefit of the local interaction ES (LES) is gained when the neighborhood sizes are extremely small. This paper starts with a review on the notation and algorithmic formulation of the local interaction ES. Next we consider a phenomenon specific to LES namely the frequently occurence of self-mating due to the small size of the local neighborhoods. A growth curve analysis and an analysis of the decrease in variation in the population over time is used to examine the respective selective pressure induced by the particular form of parent selection and the population structure. Our real-world application the optimization of a 2-lenssystem being part of a heterodyne receiver, a microoptical communication module is based on a simulation model and thus is a time consuming task. The 2-lens-problem is highly multi-modal and multi-objectivetoo. Hence we continue with an optimization study of two benchmark problems: the Rastrigin function and Shekel’s foxholes problem. These multimodal problems are known to be difficult for the standard Evolution Strategies, but can be solved by using LES. Our specific interest is on the effects of self-mating and on the influence of the population structure and the neighborhood size. The result of the numerical study at hand we are able to set up an optimization scene for the 2-lens-system. We present optimization results for two forms of LES and compare these to results gained by ES and a hillclimber with respect to both, convergence reliability and convergence velocity. 2167

2 The Diffusion Model ES 2.1 The Notation Schwefel and Rudolph introduced in [13] a notation for contemporary Evolution Strategies. They are now formally referred to as (/I,K , 4 P) - E S ,

with p 2 1 the number of parents, K 2 1 the upper limit for life span, X > p if K = 1 the number of descendants, and 1 5 p 5 p the number of ancestors for each descendant. For rc = 1 this resembles the comma-ES where parents of the next generation are chosen from the descendants only. K = 00 denotes the plus-ES where the p parents of the next generation are selected from both the current pool of parents and their descendants. Introducing a life span allows thus a gradual transition from either the extreme form of a ( p , A) E S o r a ( p + A) - ES. We define a local selection Evolution Strategy (LES) as R,

P,U) - LES,

with v denoting the size of the neighborhood, Alp E IV is the number of locally generated offspring individuals competing for survival, and p 5 v the number of ancestors. Let I = A , x A , denote the space of individuals with A, as the set of object variables and A , as the set of strategy parameters. The individuals ai are placed on the sites i of a regular spatial structure. In this study the considered population structure is either a ring or a square torus. The neighborhood of any individual ai E Ii’, 1 5 i 5 p consists of the center individual and the neighbors geographically located nearby. Thus, the neighborhood is the reproduction community of a center individual ai forming the local pool in which selection and recombination operates. In the one-dimensional case of a ring population structure the neighborhood of an inidividual ai consists of the center individual and its 2 * r nearest neighbors, where r is the radius. In the two-dimensional case we consider here the von Neumaim neighborhood including the center and the four nearest neighbors and the Moore neighborhood including the center individual and the eight individuals being placed on the surrounding sites of the center. In the following we assume K = 1 and p = 2, i.e. concentrate on the comma-ES and use binary recombination. Thus, for short, we refer to a diffusion model ES with a particular population structure as ( p , X / p , v ) - population .stru.cture

2.2 The Algorithm Starting from an initial population P o consisting of p individuals the generation transition is modeled as

where optLEs : I -t I defines the p times independently and locally operating reproduction cycle as

(1 5 i 5 p ) . The mutation operator mut : I -+ I is not affected by the introduction of the concept of locality. For a description we refer to [13]. The remaining selection and recombination operators are both transformed by the local interaction ES. The selection operator in the traditional comma-ES acts globally in the sense that out of A descendants the p best are chosen to form the next parent population. In the diffusion ES the selection for survival acts locally only and for each individual independently. Now, we have seii : + I (1 5 i 5 p ) Thus, the selection for survival takes the form that from a local pool of size X / p descendants the best individual is selected and replaces the center individual of the local neighborhood. This is performed independently for each population member, i.e. p times. In the following we still assume a generational approach where the entire current parent population is updated in discrete time steps. The recombination operator is defined as ret' : I” -t I with reCV := re o CO. The operator re : If’ + I creates one offspring by combining the parents’ characteristics. and is not affected by the spatial concept. We refer to [ 131 for a description. The operator CO : I” + I P acts locally only and chooses 1 5 p 5 v parents from a neighborhood for all population members a i , 1 5 i 5 p. An obvious way to adopt the parent selection method used in the traditional ES to the local selection ES is to choose both parents from a local neighborhood with uniform probability. This is the particular selection form used in LICE [ 5 ] . We call this local parent selection and denote it as CO’. We propose another parent selection method where the center individual of a neighborhood is taken as one parent and only the second parent is chosen with uniform probability from the neighborhood; this is called centric parent selection and denoted as CO=. := seii o (mut o rec”)’lfi

3 Growth Curve Analysis A standard technique used to analyze selection algorithms is based on the examination of the growth curve of an initial single best individual over time by selection alone [14], i.e. neither recombination nor mutation are active. To extend the growth curve analysis to truncation selection mechanisms we need to include both the selection for reproduction and the selection for survival. This is done by selecting a pair of parents for reproduction according to CO and define the child to be any one of either parent. The selection for survival then selects the best offspring to be included into the next reproduction cycle out of a local pool of size X / p . 3.1 The Phenomenon Self-mating In ES the parents are chosen from the population with replacement. Hence, the chance that both parents are the same increases with the reduction of the pool size from which the par-

2168

ents are drawn. Thus, in local selection ES with their small local pool size we expect to have an increased rate of self-mating or short selfing. The consequence of selfing in an actual ES is that an offspring is created by mutation only as the recombination has no effect due to the fact that both parents are identical. To exclude selfing we may use a parent selection method without replacement. This is denoted by a bar above the parent selection method, E' or E', respectively. The number of different possible matings in the centric parent selection is v with selfing and v - 1 without selfing: The rate of self-mating is thus v - (v - 1)case of self-mating out of v different matings, 1 / v . In the case of local parent selection we count v(v 1)/2 different mating possibilities if selfing is allowed and v(v 1)/2 without selfing. The difference between these terms yield Y possible combinations of self-mating. From this we = $. can compute the rate of selfing as However, the main difference between a selection method where selfing is allowed CO and its counterpart with selfing disabled CO concerns the propagation of information. If selfing occurs in the centric parent selection method CO' then the descendant produced from the center individual replaces the center individual itself. Consequently in this case we have no propagation of information and thus we expect slower growth curves compared to selfing disabled Z'. In contrast to using local parent selection both parents are chosen randomly from within the neighborhood. Then we count v different possible self-matings from which only a single one is coupled with no propagation of information. We expect thus the differences in the growth curves of CO' and CO' to be negligible with the number of descendants generally assumed to be generated. Figure I shows the effects of selfing (self>vs. no selfing on the growth curves. A population size of 100 with a ring population structure and a neighborhood size of v = 5 is used. The number of descendants produced locally is set at 4 and 6, respectively. We identify different curves CO' and EjCfor the centric parent selection methods whereas the curves for the are identical. local parent selection methods CO' and Note, that the curves shown in all figures of section 3 and 4 are averages over 100 runs each to eliminate random effects. To facilitate the interpretation of the figures the legends are sorted from weakest to strongest selection pressure with the exception of reference curves placed at the bottom.

Ring Population Structures 1

0.8

8

'k

0.6

2

0.4

LE Y

0.2

+

0

0

&

3.2 Local vs. Centric Parent Selection

In this and the next section we disable selfing to eliminate side effects being related to the size of the neighborhood only, i.e. we will concentrate on Z. The parent selection mechanisms differ in that the parents of each pool of competing offsprings are either a random sample taken from the local neighborhood in case of E' or in the case of the centric parent selection E' the pool consists of half siblings as the locally generated descendants do have all one

10

20 30 40 50 Number of Generations

60

Figure 1: Growth curves for local (1) and centric (c) parent selection with and without selfing. parent, the center individual, in common. If we assume an infinite population, the number of members of the best class Nt in generation t can be approximated from above as follows. In the ring population structure information may only propagate linearly and the range of propagation is limited by the neighborhood size. This yields Nt = No + t(v - l ) , with No = 1. In the torus population structure the range of propagation is again limited by the neighborhood size but the information may now propagate in four directions. Thus, we have No = 1, NI = No + l ( v - l ) , . . Nt = Nt-1 +t(v- 1). This yields Nt = Y ( v - 1) + 1. Figure 2 shows the growth curves for a finite population size of 100 and A / p = 6, a selective pressure commonly used with ES. The population structure is either a ring with v = {3,5,7,9} or a 10 x 10 torus. The ring population structure with the smallest neighborhood size gives the fastest growth curves as predicted by the approximation given above and the local parent selection form is faster than the centric parent selection form. Increasing the neighborhood size leads to a decrease of the takeover time of the best class. For the torus the growth rate is extremely fast even though v has been set to the smallest possible value. This is what we have expected as the propagation of the initially single best is now quadratic. The limiting case for LES is when the neighborhood size approaches the size of the population, i.e. v = p. In this specific case the growth rate is independently from the chosen population structure, thus the name of the population structure is referred to by any. The fastest growth curve in figure 2 belongs to the standard (100,60O)-ES which has been included as a reference. . )

2169

tion counterpart. With a ring population structure and very small neighborhood sizes at generation 300 we still found a significant variability. The populations with a torus structure have lost most of their initial variability at generation 300. Both global selection methods are at generation 1300 very close to total convergence: (100,6,100) - any with i d has an average variability of only 1.1,with 3 we have 1.35, and the standard (100,600) - ES has a variability between the former two methods of 1.2 different classes on average.

Ring Population Structures 1 0.8

Do

'E

0.6

c)

2 0.4 rn

Ring Population Structures

0.2

n 0

10

20

30

40

50

60

Number of Generations Torus Population Structures and Global Selection Forms

0.8

8

'E

0.6

1 '

2' $

0

50

c)

0.4

150 200 Number of Generations 100

I

250

300

Torus Population Structure and Global Selection Forms

0.2

100 n

-2

0

0

30 40 50 60 Number of Generations Figure 2: Growth curves for local (1) and centric (c) parent selection for various neighborhoods. 0

10

20

k

z

r,

E

3

4 Analysis of the Decrease in Variation

10

U c)

The growth curve analysis shows only one aspect of the behavior when changing the selection mechanism. The other aspect concerns the loss of variability in the population over time. In small populations we make the paradox observation that even with no selective pressure the variability of the population decreases over time. Although we cannot predict the direction of this selection process we can determine the amount with which we expect this phenomenon to occur. To analyze the loss of variability under various conditions we initialize the startup population with U , different classes all getting the same fitness assigned. Then we run the algorithm as before and count for each generation the number of different classes being present in the population. Figure 3 shows the decrease of the number of different classes found in the populationover time for the same settings of the algorithm as above, The centric parent selection method keeps variability always longer than their local parent selec-

-g 3

.-3

1

0

50

100

150 200 Number of Generations

250

300

Figure 3: Decrease curves of variability for local parent selection (1) and centric parent selection (c).

5 The Benchmark Problems This section shows the behavior in terms of convergence reliability and convergence velocity of the various forms of local interaction ES on two multi-modal mimimzation problems: the generalized Rastrigin function and Shekel's Foxholes problem with 25 holes [15]. These problems have also

2170

'

Strategy

CO'

perc (100,6,3)-ring 67.0 (100,6,5)-ring 45.0 (100,6,9)-ring 34.5 (100,6,13)-ring 33.5 (100,6,5)-torus 32.5 (100,6,9)-torus 34.5 (100,6,13)-t01~~ 38.0 (100,6,10O)-any 44.5 (100,60O)-ES 56.5

gen acc 314 31.0 272 18.8 244 10.4 235 7.2 248 19.1 235 10.5 231 7.2 223 0.9 193 1.7

-

Ed

COC

perc 86.5 61.5 56.5 49.5

gen 310 270 247 238 253 238 233 223

58.0

54.5 53.0 42.0

perc 89.5 70.5 65.0 62.0 52.5 65.0 67.5 70.1

COC

gen 396 308 276 268 270 264 261 257

acc 41.0 18.7 9.6 6.4 19.7 9.6 6.4 0.9

perc 76.0 70.5 66.5 68.0 57.0 56.5 70.5 73.5

gen 320 290 271 266 292 269 264 257

Table 1: Probability of finding the global optimum for Rastrigin's function. size. Following the data of table 1 the torus population structure did not show the type of emergent behavior as observed with the linear ring population structure. In fact with the exd global rules led to better results than local interecption of i action. The LES gets its highest gain from the local differentiation process when the population structure is a ring and the neighborhood size is very small. In this case, the centric method with selfing CO' performs best closely followed by the local method without selfing i d.

been subject of [ 5 ] . The implementation of our test environment is based on LICE-1.02 [ 5 ] . We added the centric parent selection method, the possibility to exclude self-mating and the support of further population structures. In this paper, it is focused on the ring and the torus with a compact neighborhood. The representation uses n object variables xj (1 5 j 5 n) and a single mean step size U for all xj as experiments with LES using multiple step sizes uj failed in all cases [5]. The global mutation parameter TO controlling the overall change of the mutability and the individual mutation parameter T have been set to the same values as used in [ 5 ] : TO = 0.1, T = 0.3. The recombination operator is discrete for the object variables xj and intermediate for the step size U . 20 experiments with 20 runs each were performed. For each experiment, we counted how often the global best was found. The tables give the percentage of successful runs perc and for these in gen the average number of generations until the global best was found. If selfing is allowed, we give the average number of kids per generation produced by selfing and being accepted (acc)to survive and become a parent.

2

0

I

*

...... ......

The generalized Rastrigin function is a multi-modal and highdimensional problem with n = 20 objects variables. It is known to be very difficult for ES. The individuals are initialized by setting the object variables Vi xi = 512 and the step size to U E [0.5,5]. Table 1 summarizes the.results for local and global selection ES. The columns gen reflect ,our results from the growth curve analysis. A faster convergence is observed with an increase of the selection pressure induced by the parent selection method or a larger neighborhood size. Figure 4 shows the columns perc of table 1. The winning strategy is LES with the smallest possible neighborhood size. Increasing the neighborhood size first degrades the performance, but, as the neighborhoodsize becomes large enough in relation to the population size the reliability of the final quality increases again due to the availability of global knowledge. In comparison with the standard ES, the higher explorative behavior of CO' (c,self) and E' (c, noself) leads to higher rates of convergence to the global optimum for any neighborhood

,

,

3 5 1 9

13

25 Size of Neighborhood v

-

5.1 Generalized Rastrigin Function

(100,6,v)-torusc self (100,6,v)-torusc no-self

....... ....... (100,6,v)-torus1 self .---~-.- (100,6,v)-torus1 no-self

..

..... ....

,I

, , , ,

100

-9-

(100,6,v)-ring c self (100,6,v)-ringc no-self (100,6,v)-ring1 self (100,6,v)-ring1 no-self

(100,60O)-ES

Figure 4: Convergence reliability for Rastrigin's Function.

5.2 Shekel's Foxholes Function The Shekel's Foxholes problem, widely known as the function F5 of De Jong's test suite, is known as a difficult problem for ES. The topology is an area-wide plateau with holes of different depth. With the standard ES, the problem arises that once the population gets attracted by one of the holes it is almost impossible for the algorithm to leave it again. In fact, in none of 200 runs of a (100,60O)-ES an optimum solution was found. The initial setting of the object variables was chosen far away from the holes V i xi = 500 and the step size is U E [ l e - 4, le41. Table 2 shows that the centric parent selection method CO' with smallest possible neighborhood size is the clear winner: The global best is found in almost all runs. In

2171

Strategy

4

CO

CO1

perc (100,6,3)-ring (1 00,6,5)-ring (100,6,9)-ring (100,6,13)-ring (100,6,25)-ring (100,6,5)-torus

1 I

74.5 84.5

gen

acc

156

11.8

176

4.1 21.1

171

I

I

perc

gen

perc

CgOeSn

acc

60.5 43.0

176 221

20.0 96.0

248 209

4.2 25.5

1 I

I

per