Galley Proof
9/11/2009; 17:25
File: kes184.tex; BOKCTP/llx p. 1
International Journal of Knowledge-based and Intelligent Engineering Systems 13 (2009) 1–22 DOI 10.3233/KES-2009-0184 IOS Press
1
To explore or to exploit: An entropy-driven approach for evolutionary algorithms Shih-Hsi Liua,c,∗ , Marjan Mernikb and Barrett R. Bryanta a
Department of Computer and Information Sciences, University of Alabama at Birmingham, USA Faculty of Electrical Engineering and Computer Science, University of Maribor, Slovenia c Department of Computer Science, California State University, Fresno, USA b
Abstract. An evolutionary algorithm is an optimization process comprising two important aspects: exploration discovers potential offspring in new search regions; and exploitation utilizes promising solutions already identified. Intelligent balance between these two aspects may drive the search process towards better fitness results and/or faster convergence rates. Yet, how and when to control the balance perceptively have not yet been comprehensively addressed. This paper introduces an entropy-driven approach for evolutionary algorithms. Five kinds of entropy to express diversity are presented; and the balance between exploration and exploitation is adaptively controlled by one kind of entropy and mutation rate in a metaprogramming fashion. The experimental results of the benchmark functions show that the entropy-driven approach achieves explicit balance between exploration and exploitation and hence obtains even better fitness values and/or convergence rates. Keywords: Evolutionary algorithms, genetic algorithms, entropy, parameter control, exploration, exploitation
1. Introduction An Evolutionary Algorithm (EA) [4,18,35] is a population-based meta-heuristic and stochastic optimization search process that mimics Darwin’s evolution theory in compliance with Computer Science principles and methodologies. In order to obtain optimal solutions, the search process (i.e., evolutionary process) of an EA is leveraged by two important aspects: exploration visits entirely new regions of a search space to discover promising offspring; and exploitation utilizes the information from previously visited regions to determine potentially profitable regions to be visited next [13]. Although optimal search solutions can be acquired by leveraging the synergy of exploration and exploitation aspects, the correlation between two aspects has not been explicitly distinguished. How and when to control and balance exploration and exploitation in the search process to obtain even better fitness results and/or convergence faster are still on-going research. ∗ Corresponding author. E-mail:
[email protected] or shliu@ csufresno.edu.
How are exploration and exploitation controlled during an evolutionary process? Besides the parameter tuning category where balance between exploration and exploitation is implicitly driven by fixed parameters during the evolutionary process, there are two remarks to address this topic: uni-process-driven and multiprocess-driven approaches. For the uni-process-driven approach, balance between exploration and exploitation is controlled by a unique process (e.g., selection or population resizing [12,45]) or a unique operator (e.g., mutation or crossover). Various studies (e.g. [3, 40,44]) showed that a selection process controls the level of exploration or exploitation by varying its selection pressure. Directing an evolutionary process towards exploration or exploitation is also possible by population resizing [23,45]: With bigger population size, the search space is explored more than with smaller population size. The mutation and crossover operators also adjust the power of exploration and exploitation by respectively tuning their mutation rate (p m ) and crossover rate (p c ) towards either aspect. Conversely, for the multi-process-driven approach, because the above processes/operators are capable of controlling
ISSN 1327-2314/09/$17.00 2009 – IOS Press and the authors. All rights reserved
Galley Proof
2
9/11/2009; 17:25
File: kes184.tex; BOKCTP/llx p. 2
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms
exploration and exploitation aspects conjunctly, more than one process/operator can be selected to control the two aspects individually. For example, with high selection pressure to force exploitation,a mutation/crossover operator should be chosen that is biased to exploration. When are exploration and exploitation controlled during an evolutionary process? There are also two remarks to address this topic: fitness-driven and diversitydriven approaches. For the fitness-driven approach, fitness values are applied to determine the appropriate time to leverage exploration or exploitation. For example, if the best fitness value is improved at a given generation or has not improved for a number of predefined generations, ProFIGA (Population Resizing on Fitness Improvement Genetic Algorithms) [12] increases population size. Otherwise, the population size is decreased. The 1/5 success rule [5] is also classified in this category: The ratio of better fitness individuals generated by the mutation operator is regarded as an adaptation criterion to update the new mutation rate towards more exploration or exploitation. For the diversity approach, diversity measures (e.g., standard deviation and Euclidean distance) control when to perform more exploration or exploitation. For example, the Diversity-Guided Evolutionary Algorithm (DGEA) [47] uses a distance-to-average-point measure to alternate between exploration and exploitation phases. As an extension of our preliminary studies [29,37], this paper applies uni-process-driven and diversitydriven approaches to balance exploration and exploitation. Based on the empirical experiments in [37], Euclidean distance and standard deviation of fitness values are not good diversity measures to control the evolutionary process. Hence, an entropy-driven approach was investigated. Such an approach utilizes mutation as a balance operator and one type of entropy (i.e., linear entropy) as a diversity measure to respectively tackle how and when topics. Four other types of entropy (i.e., Rosca’s, Gaussian, fitness proportional, and clustering) are also present as reference diversity measures. The adaptive decision process on how and when is described by a domain-specific language [34], called PPCEA (Programmable Parameter Control for Evolutionary Algorithms) [28]: When entropy is larger than a threshold, the adaptive decision process directs the evolutionary process to exploit visited regions using smaller mutation rates; and conversely, when entropy is smaller than the threshold, the process tends to explore more new regions of the search space using higher mutation rates. Because different kinds of fitness functions
may possess various exploration/exploitation characteristics, this paper experiments with the entropy-driven approach with twelve unimodal and multimodal functions from [49]. The Schaffer [42], Fogarty [15], and 1/5 success rule (introduced by Rechenberg and refined by B¨ack and Schwefel [5,10]) approaches are also revisited as antithetic studies. The experimental results show that the entropy-driven approach using linear entropy not only explicitly balances exploration and exploitation, but also converges faster towards optimal solutions of unimodal functions (except for noisy functions [30]) and obtains even better optimal results of multimodal functions. Because of space considerations, the paper only presents the experimental results of genetic algorithms [35] controlled by linear entropy. Experimenting with the five kinds of entropy to balance between the two aspects in evolution strategies [35] and with the multi-process-driven approach to adjust the directions and magnitudes of the two aspects in both genetic algorithms and evolution strategies is our future work. Additionally, self-adaptive parameter control [11] using PPC EA will be addressed in the future. The paper is organized as follows. Section 2 describes the related work. In Section 3, the definitions of five kinds of entropy categorized by different fitness classes and levels of abstractions (i.e., genotype versus phenotype) are presented. Also, how to control exploration and exploitation using the entropy-driven approach with PPCEA is introduced. Section 4 shows the experimental results of the twelve unimodal and multimodal functions from [49]. Finally, Section 5 addresses the conclusion and future directions.
2. Related work Balance between exploration and exploitation has been, up to now, mainly controlled by determining the settings of parameter values for the involved processes and operators. There are a variety of studies on this topic (e.g. [6,11,19,28,46]). Recommendations on parameter settings for a particular set of problems can be found in [9,42]. The following paragraphs classify related work in three aspects in terms of diversity on which we focus to solve the how and when topics: – Diversity Persistence/Enhancement: To maintain genetic diversity and hence provide better balance between exploration and exploitation, numerous other techniques have been introduced at different stages of an evolutionary process (e.g., selection,
Galley Proof
9/11/2009; 17:25
File: kes184.tex; BOKCTP/llx p. 3
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms
mating, and replacement). A few such diversification techniques are: scaling mechanisms (e.g., sigma scaling, rank scaling) [18], non-standard selections (e.g., restricted tournament selection [22]), genotypic entropy-driven selections (e.g. [26,36]), fitness-sharing [18], incest prevention [14], seduction [40], crowding [9], neighborhoods [8], islands [33], struggle genetic algorithm [20], and diversity control oriented genetic algorithms (e.g. [8, 44]). The main idea of these techniques is to maintain genetic diversity and hence avoid premature convergence. More recent approaches in this direction are reported in [17,24,31]. In [17], a new selection method was introduced to provide a clear separation between exploration and exploitation. A population is separated into two sexes. The selection of females handles the exploration of the search space, because all females get to reproduce regardless of their fitness. Conversely, the selection of males handles the exploitation of the search space, because only fitter males are selected. Huang [24] presented a modified evolutionary strategy with a diversity-based selection pooling scheme, which modulates the number of parents entering the selection pool based on several diversity measures. In [31], a hybrid replacement scheme was introduced such that individuals are replaced due to their poor performance in terms of fitness and diversity. – Diversity Measurements: The diversity persistence/enhancement approaches concentrate on maintaining genetic diversity represented by regular fitness computations. As the aforementioned standard deviation and Euclidean distance are not explicit diversity measures for the when topic, more explicit measurements of diversity are discussed here. One of the earliest researchers that investigated entropy, a representation of diversity, in EAs was Rosca [41], whose experiments showed that populations appeared to be stuck in local optima when entropy did not change or decrease monotonically in successive generations. Rosca used fitness values in a population to define entropy and free energy measure. The Diversity-Guided Evolutionary Algorithm (DGEA) [47] used a distance-toaverage-point measure to alternate between phases of exploration and exploitation. DGEA can be expressed easily as a PPCEA program [28]. Moreover, DGEA does not use entropy as a measure for diversity. In [32], entropy was also introduced into EAs for determining the optimal number of
3
clusters. However, in this case the fitness function is entropy-based whereas the four kinds of entropy introduced in the next section are the variants of Rosca’s entropy as new diversity measures to facilitate explicit balance between exploration and exploitation. – Diversity Learning: In order to explicitly and efficiently balance exploration and exploitation, learning techniques have been recently adopted. The overall picture of these novel introductions is to store the useful records (e.g., diversity) of an evolutionary process, analyze and learn the records, and adjust search processes accordingly. In [2], the Genetic Algorithm using SelfOrganizing Maps (GASOM) utilizes the neural network concepts to achieve such a prospect. Initially, a training set of a Self-Organizing Map (SOM) [25] is exercised as a reference of diversifying individuals. Two tables that store current population diversity and search history are analyzed and visualized using the SOM. Users can easily discover the regions of search space to be explored. Balance between exploration and exploitation is controlled by a balance function which indicates the number of neurons that should be activated at the current generation. A reseeding operator that preserves the exploration power is driven by the difference between expected and current activations of neurons. One of the main objectives of this paper is to extend Rosca’s entropy in trying to find more explicit diversity measures in EAs. This paper also concentrates on introducing a new diversity learning approach that uses entropy as feedback to control exploration and exploitation as well as maintain diversity on-the-fly using PPCEA [28].
3. Entropy-driven exploration and exploitation In EAs, mappings between genotypes and phenotypes can be classified into three categories: (a) one-to-one (i.e., bijective [48]), where identical genotypes produce identical phenotypic fitness; (b) oneto-many, where identical genotypes produce different phenotypic fitness; and (c) many-to-one (i.e., noninjective and surjective [48] or non-injective and nonsurjective [48]), where different genotypes produce the same phenotypic fitness. Also, all mappings can be nondeterministic if genotype, phenotype or both are influ-
Galley Proof
9/11/2009; 17:25
4
File: kes184.tex; BOKCTP/llx p. 4
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms
enced by noise. For one-to-one mapping, a decrease in genotype diversity will necessarily cause a decrease in phenotype diversity. For one-to-many and many-toone, the relationship between genotype and phenotype diversities is problem-specific and hence requires individual studies. To measure entropy introduced in this section, one needs to define some structural measures. Such measures, however, might be computationally intensive in some instances of EAs (e.g., genetic programming) [7]. Fortunately, such measures at the phenotype level are sufficient for the benchmark functions featured with one-on-one mapping (e.g., f 1, f 2, f 4, f 6, f 9, f 10, f 11, f 14, f 16, f 17, and f 20 from [49]). The observation of a benchmark function featured with one-to-many or many-to-one mapping (e.g., noisy f 7 from [49]) is addressed in Section 4 and further studied in [30]. The following subsection presents five kinds of entropy, accompanied by the approach to obtain better optimization search results and/or faster convergence rates through balance between exploration and exploitation using one kind of entropy and various mutation rates. Discussions on the five kinds of entropy used for one-to-many and many-to-one mappings are also addressed in Section 3.1. 3.1. Entropy in EAs Entropy is a concept in thermodynamics, information theory, statistical mechanics, and other fields, to express disorder. Shannon [43] defines entropy in terms of a discrete random event x, with possible states (i.e., classes) 1. . . n as: n n pi∗ log2(1/pi)=− pi∗ log2pi (1) H(x) = i
i
In [41], Rosca introduced entropy to the evolutionary computation community by adapting Shannon’s formula. Such an entropy is computed by classifying individuals into a fitness class i according to their behavior or phenotype. The pi is the proportion of the population size of the fitness class i. To express disorder sufficiently and appropriately in different fitness functions and to facilitate explicit balance between exploration and exploitation using the entropy-driven approach, this paper extends Rosca’s entropy using four new kinds of classification strategies. Another motivation for looking at different classification strategies was our impression that Rosca’s classification is too fine-grained: even the smallest difference in fitness value will classify an individual into a
different fitness class. In a noisy environment (e.g., the noise quartic function (f 7) [49]), such a fine-grained classification is not a beneficial characteristic to obtain the information regarding exploration and exploitation. Figure 1 shows four new kinds of entropy determined by different classification strategies. Note that the first three types of entropy have been presented in [29] and tested only on functions f 1,f 9, and f 17 of [49]: – Linear: Assign a predefined yet changeable value to the number of fitness classes, n. For each generation, the interval between the best and worst fitness values is evenly partitioned into n subintervals as fitness classes (the top left chart of Fig. 1). An individual whose fitness value is occupied in a specific sub-interval is classified into the corresponding fitness class. Note that the top left of Fig. 1 is to illustrate the classification strategy adapted by linear entropy. Individuals may be either reasonably spread or chaotically sparse in a population. – Gaussian: The classification strategy of fitness classes in this case is derived from Gaussian distribution [38]. For each generation, fitness classes are “spread out” from the average fitness value (average) with the standard deviation (σ). For example, the upper/lower bound of the first fitness class (P1 in the middle chart) is computed as average +/ − σ. The boundaries of the successive classes (P2 to P5) can be generalized as average +/−i*σ, where i ∈ Z+ and i 4 [38]. The lower bound of the leftmost fitness class is less than or equal to the smallest fitness value, and the upper bound of the rightmost fitness class is larger than or equal to the biggest fitness value. – Fitness proportional: The fitness proportional approach is a variation of Rosca’s approach [41]. Rosca’s fitness classes are partitioned by individuals according to their behavior or phenotype. pi is the proportion of a population occupied in the ith fitness class. In this approach, p i is defined the same as the probability of selection of individual i in fitness proportional selection, P opsize f i, where fi is the fitness valpi = f i / i ue of an individual. Hence, p i is the criterion for categorizing fitness classes. As all individuals of a population have different p i (namely, different fitness values), the number of fitness classes n equals the population size (Popsize). If more than one individual contains the same fitness value (i.e., fi = fj and pi = pj where i = j based on the above fitness proportional definition), equation (1)
Galley Proof
9/11/2009; 17:25
File: kes184.tex; BOKCTP/llx p. 5
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms
5
Fig. 1. Linear (top left), Gaussian (top right), fitness proportional (bottom left), and clustering (bottom right).
avoids computing duplicate individuals and n < Popsize. It is because two identical fitness classes are not necessary, and the elimination complies with the definition of diversity. The bottom left chart of Fig. 1 shows 15 fitness classes sorted by pi , and each of which has one or more individuals occupied. – Clustering: In order to classify and analyze a large amount of data, cluster algorithms are widely utilized in the data mining community [21]. Such algorithms group data into several clusters based on specific similarity criteria. This paper utilizes the K-Means clustering approach [21] to partition individuals into k clusters, where 1 k popsize and k is a fixed number in the evolutionary process. Initially, k individuals are selected as the centroids of the clusters. During the iterative classification process, each individual computes the Euclidean distances between itself and k centroids. Thereafter, all individuals are assigned into a new cluster based on the computed shortest distance. Each centroid is updated by averaging the genotypes of the individuals in the cluster. The iterative classification process continues until all centroids are no longer updated.
Table 1 shows the comparison of the five types of entropy: linear, Gaussian, fitness proportional, and Rosca’s entropy utilize fitness values (phenotype) to categorize individuals into different fitness classes. Conversely, Euclidean distance is used to partition individuals for computing clustering entropy, which is regarded as genotypic entropy. Note that the entropy classification levels in Table 1 are based on the entropy definitions described above. All of the above entropy types, in fact, can be defined and classified at both genotype and phenotype levels. For example, linear entropy can use Euclidean distance to linearly classify individuals and clustering entropy can use fitness as similarity criterion to classify individuals. For simple diversity measures, using entropy on phenotype is sufficient. However, for more advanced cases (e.g., individuals possessing the same phenotype but different genotypes or vice versa), users should make decisions to select the appropriate entropy at a specific level for the entropy-driven approach with respect to a particular mapping and specific functions. An example of selecting appropriate entropy for noisy functions has been published in [30]. Although the boundary values of fitness classes of each type of entropy are varied during an evolutionary
Galley Proof
6
9/11/2009; 17:25
File: kes184.tex; BOKCTP/llx p. 6
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms Table 1 The comparison of the five fitness classes for entropy Fitness Classes for entropy Linear Gaussian Fitness Proportional Rosca Clustering
Classification Level Phenotype Phenotype Phenotype Phenotype Genotype
process, the number of fitness classes (n) can be either fixed or varied during an evolutionary process. Based on the descriptions before, linear and Gaussian entropy predefine the value of n, where n of linear entropy can be tuned and n of Gaussian entropy is fixed permanently [38]. Clustering entropy also has a fixed number of fitness classes equivalent to the number of centroids, k. Note that because k is an adjustable parameter in PPCEA , the K-Means clustering algorithm can be extended to hierarchical clustering and K-nearest neighbor approaches [21], to name a few, in the future. Lastly, in terms of the adjustment of class boundaries during an evolutionary process, entropy can be also classified into parameter tuning, deterministic, adaptive, and self-adaptive. For linear, Gaussian, fitness proportional, and Rosca’s entropy, boundaries of fitness classes are determined based on the fitness values of all individuals during an evolutionary process. For clustering entropy, an iterative classification process autonomously assigns individuals into different clusters and then updates all centroids based on Euclidean distances between individuals and centroids until a stop criterion is met. Such a process can be treated as a self-learning (i.e., self-adaptive) approach for adjusting the boundaries of fitness classes. In order to fairly measure the differences and discover the most appropriate type of entropy, n of linear entropy is initially set to popsize (Population size). Such a preliminary setting is due to the fact that the numbers of fitness classes for Rosca’s and fitness proportional entropy are initially close to popsize because of the classification strategies. Conversely, as an antithetic case, k is purposely assigned a value much smaller than popsize. The other reason for avoiding k = popsize is that clustering entropy will act like either Rosca’s or fitness proportional entropy under the given setting. In this case, the only difference among these three types of entropy is that cluster entropy wastefully and meaninglessly carries out the iterative classification process. Experimenting on different settings of n and k is an on-going task, and some of them have been published in [30,37].
Class Number during the process Fixed (n) Fixed (n) Varied Varied Fixed (k)
Boundary Adaptation Adaptive Adaptive Adaptive Adaptive Self-Adaptive
3.2. The entropy-driven approach An overview and survey of the problem of parameter settings in EAs is introduced in [11]. Whether a set of initial parameters is changed during the run of an evolutionary process categorizes the problem into parameter tuning and parameter control, respectively. Numerous researchers have been working on discovering the optimal settings using the parameter tuning approach (e.g. [9,19,42]). From their efforts, mutation rate (p m ) . =· 1/(the bit length of an individual in genetic algorithms) and crossover rate (p c ) .=· 0.75∼0.95 are commonly agreed optimal parameter settings in genetic algorithms. The research on parameter tuning also reveals a new problem – the optimal solution of an evolutionary process is influenced by two main factors: (a) many parameters are dependent on each other; and (b) different types of fitness functions may have different characteristics given the same parameter settings. Hence, the parameter setting problem is function-specific. Such factors motivate the research on parameter tuning towards parameter control, including deterministic, adaptive, and self-adaptive approaches [11]. The following paragraph introduces the entropy-driven (adaptive) approach using PPC EA . The Schaffer (parameter tuning), Fogarty (deterministic), 1/5 success rule (adaptive) approaches using PPCEA have been covered in [29]. Selfadaptive approaches are our future work. To avoid premature convergence yet maintain a fast enough convergence rate and obtain the same or even better optimal solutions compared to the Schaffer, Fogarty, and 1/5 success rule approaches, the entropydriven approach is introduced. Similar to the 1/5 success rule [5], this approach includes both “how” and “when” to balance exploration and exploitation. However, the main difference between the two approaches is the insight of adapting p m based on the evolutionary process feedback and the measure to control when: The 1/5 success rule utilizes the mutation operator to explore new regions if the fitness-driven measure for when, i.e., success mutation ratio (φ), is greater than 0.2; and exploitation will be performed if φ is smaller than 0.2 (please refer to [27] for more details). Con-
Galley Proof
9/11/2009; 17:25
File: kes184.tex; BOKCTP/llx p. 7
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms
7
Fig. 2. The Entropy-driven approach using PPCEA .
versely, to fulfill the objectives of maintaining a fast enough convergence rate and obtaining optimal solutions of the entropy-driven approach, entropy is used as a succinct diversity measure for when, and p m is again utilized to control how. Figure 2 shows the entropydriven approach to balance between exploration and exploitation using PPC EA : Lines 11 to 13 express that if a population is more diverse than expected, p m is adjusted to smaller steps to exploit the current search space. Conversely, p m is adapted to larger steps to avoid premature convergence if the population is less diverse, as shown in lines 14 to 16. In order to have the same comparison criteria, the adjusting steps for pm in lines 12 and 15 are the same as the 1/5 success rule approach [5,10], where 0.82 and 1.22 are inversely proportional learning rates experimented by Schwefel [10]. Both approaches utilize the same exploration leverage for how, yet different exploration directions (i.e., the 1/5 success rule approach tends to converge and the entropy approach tends to maintain diversity). Also, the measures for when are different. In this paper, the linear entropy parameter is used as the feedback and 0.5 is chosen as a threshold of the exploration and exploitation trends in the entropy-driven approach based on the empirical experiments in [30,37]. Other types of entropy are auxiliary references and can still be applied by changing the parameters in lines 11 and 14. Additionally, by using PPC EA , one can easily express other more sophisticated algorithms to control the balance between exploration and exploitation. For example, the current threshold is fixed. This limitation
can be easily improved by introducing deterministic or adaptive thresholds in a metaprogrammable style. Also, to facilitate exploration, population resizing mechanisms may randomly generate new individuals from scratch or select individuals from existing pool, to name a few, at any phase of an evolutionary process. The entropy-driven approach may be a good solution, because the drastic entropy changes resulting from randomly generated individuals can be a good diversity measure to control parameter resizing mechanisms toward optimization and/or convergence. Additionally, the 1/5 success rule and entropy-driven approaches are to explore an evolutionary process as much as possible and to balance between exploration and exploitation of the process, respectively. More sophisticated algorithms that merge the two approaches to take both of their advantages are foreseen. The Schaffer, Fogarty, 1/5 success rule, and entropydriven approaches experiment with the benchmark functions for the same initial parameter settings: popsize = 100 (population size), p m = 0.005 (mutation rate), pc = 0.75 (crossover rate), and Round = 50 (number of experiments). Epoch is the stride to change parameters during an evolutionary process. Tournament selection (selection size = 10), one point mutation, one point crossover, and fitness evaluation using binary to floating number conversion are performed to generate the experimental results presented in the next section. Note that PPCEA has implemented several additional selection processes as well as mutation and crossover operators in the underlying infrastructure [27]. Exper-
Galley Proof
9/11/2009; 17:25
8
File: kes184.tex; BOKCTP/llx p. 8
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms Table 2 Benchmark Functions+ N
Benchmark Function
N 2 xi f1 (x) = i=1 N
N
f2 (x) = |x | + i=1 |xi | i=1 i f4 (x) = max {|xi |, 1 i N } N (xi + 0.5)2 f6 (x) = i=1
N ix4i + random [0, 1) i=1 N [x2 −10 cos(2π xi ) + 10] f9 (x) = i=1 i f7 (x) =
f10 (x) = −20exp(–0.2 f11 (x) = f14 (x) = f16 (x) = f17 (x) = f20 (x) = + The
1 4000 1 [ 500
1 N
N 1 x2 ) – exp( N i=1 i N x cos( √i ) + 1 i=1 i 1
N 2 x − i=1 i 25 N + j=1
N i=1
cos 2π xi ) + 20 + e
j+ (xi − aij )6 i=1 4x21 − 2.1x41 + 13 x61 + x1 x2 −4 x22 +4 x42 2 1 x2 + π5 x1 −6) (x2 − 45.1 + 10(1 – 8π )cosx1 π2 1 N N 2 − i=1 ci exp[− j=1 aij (xj − pij ) ]
+ 10
fmin
30
[–100,
100]N
0
30 30 30
[–10, 10]N [–100, 100]N [–100, 100]N
0 0 0
30
[–1.28, 1.28]N
0
30
[–5.12, 5.12]N
0
30
[–32,
32]N
0
600]N
0
2
[–65.536, 65.536]N
1
2 2
[–5, 5]N [–5, 10] × [0, 15]
−1.0316285 0.398
6
[0, 1]N
−3.32
30
]
Rng
[–600,
listed benchmark functions are from [49].
imenting with EAs using different combinations of operators can be easily configured at the metaprogramming level.
4. Experimental results This paper presents the genetic algorithm experimental results of the four approaches (Schaffer, Fogarty, 1/5 success rule, and entropy-driven) based on the suite of benchmark functions presented in [49]. The experiments on evolutionary strategies using the same approaches and entropy types are our future work. Table 2 shows the benchmark functions: f 1, f 2, and f 4 are high dimensional unimodal functions; f 6 is a high dimensional step function; f 7 is a noise quartic function; f 9 to f 11 are high dimensional multimodal functions with many local minima; f14, f16, f17, and f 20 are low dimensional functions with a few local minima; N denotes the dimensionality of each benchmark function; Rng represents the ranges of genotypes; and fmin is the global optimum of each benchmark function. As noted in [49], for unimodal functions, the convergence rates are more interesting than the final results of optimization as there are other methods which are specifically designed to optimize unimodal functions. Conversely, for multimodal functions, the final results are much more important, because they reflect an algorithm’s abilities of escaping from poor local optima and locating a good near-global optimum [49]. In our experiments, the same settings for the numbers of generations of all benchmark functions are applied as in [49]. Because f14, f16, f17, and f 20 have relatively
small maximum generation numbers, Epoch is set to 5 for these four functions. All other functions use 50 as the Epoch stride. The rationale to have such settings for Epoch is to allow an evolutionary process to be adaptively balanced at least 200 times on-the-fly. Users can determine the value of Epoch in terms of the tradeoffs between computation power and adaptation sensibility for the evolutionary process. To better illustrate the experimental results of all benchmark functions, the figures in this section consist of eleven curves: Best fitness value (B), average fitness value (A), worst fitness value (W), population diversity (D), standard deviation (S), linear entropy (E), Gaussian entropy (G), fitness proportional entropy (P), clustering entropy (C), Rosca entropy (R), and mutation rate (pm ) with respect to a population from generations 0 to the maximum generation number of each function (X-axis). Curves B, A, and W use the same definitions as all other EAs; curves C, E, G, P are defined in Section 3; curve S is the standard deviation of the fitness values of all individuals (another measure of phenotype diversity); curve D is the Euclidean distance between all individuals (another measure of genotype diversity); and curve R is the entropy defined in [41]. All but entropy curves (i.e., C, E, G, P, and R) use the left Y-axis as the coordinate. The experimental results in the figures are the average values out of fifty rounds. Because of space considerations, the figures of f 2, f 7, f 10, and f 14 are selected and presented with smaller figures. The original figures of all experiments can be obtained at [37].
Galley Proof
9/11/2009; 17:25
File: kes184.tex; BOKCTP/llx p. 9
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms
4.1. Unimodal functions This subsection includes the experimental results of five unimodal functions (f 1, f 2, f 4, f 6, and f 7). The early experimental results of the Sphere Model (f 1) have been published in [29] without exercising clustering entropy. Schwefel’s Problem 1.2 (f2): Figures 3 (a) and (b) as well as Figs 4 (a) and (b) show the experimental results of Schwefel’s problem 1.2 (f 2) using the Schaffer, Fogarty, 1/5 success rules and entropy-driven approaches, respectively. As shown in Fig. 3 (a), mutation rate is consistent during the evolutionary process. The fitness curves (i.e., B, A, W), diversity curves (i.e., S and D), and the entropy curves reach a stable state after generation 1000. Such a phenomenon is mainly influenced by the selection process, because other parameters are set to be steady. Figure 3 (b) adjusts the mutation rate to a smaller step to activate more of exploitation as the evolutionary process proceeds. The curves show a faster convergence rate compared with the Schaffer approach. However, the smaller pm causes insufficient exploration power to search for the optimal solution at the later phase and concludes with a worse best fitness value than the previous approach. Figure 4 (a) shows the results of the 1/5 success rule approach. Because the success mutation ratio (φ or RatioM) is greater than 1/5 at the early phase, exploration power may be able to discover promising offspring. The steady exploration helps to discover optimal solutions, but postpones the convergence rate, which violates the goal of a unimodal function stated in [49]. Hence, a more appropriate measure other than φ may be introduced to explicitly balance when to explore and exploit an evolutionary process. Figure 4 (b) applies the same adjusting steps for p m as in the 1/5 success rule approach but with linear entropy to balance exploration and exploitation. For the how topic, the mutation formulae decrease in linear order rather than in exponential order of Fogarty’s mutation formulae. Hence, the exploration power diminishes slower to prevent premature convergence. For the when topic, the evolutionary process is more directed towards exploitation or exploration based on current linear entropy value (see Fig. 2). The result shows that the optimal solution is discovered and the convergence rate is the fastest among the four approaches. Similar to f 2, the experimental results of f 1, f 4, and f 6 presented in [37] show the typical behaviors of the four approaches on unimodal functions. Noise Quartic Function (f7):
9
f 7 is a quartic function with noise generated by a random number generator. Because of the occurrence of the noise, the mapping between genotype and phenotype is nondeterministic one-to-one. Linear, Gaussian, fitness proportional, and Rosca’s entropies become useless, because the numbers of fitness classes (popsize) and the entropy values remain the same all the time. Fortunately, the antithetic clustering entropy shown in Figs 5 and 6 provides a relatively useful diversity measure: curve C declines along with the decrease of the fitness and diversity curves. The above observations of clustering entropy in f 7 have been experimented with and resulted in one more degree improvement [30]. For fitness value curves, the Fogarty, 1/5 success rule and entropy-driven approaches have a very similar nature because of very small p m . In comparison, the higher value pm in the Schaffer approach plays an important role to discover/explore better solutions in the later phase and results in the best optimal solution among the four approaches. Observation shows that even with noise interference, p m still rules the exploration activity in f 7. Table 3 is the summary of the experimental results for f 1, f 2, f 4, f 6, and f 7. For the convenience of comparison, the results of fast evolutionary programming and classical evolutionary programming using evolution strategies in [49] are also included. As shown in Table 3, the entropy-driven approach produces outstanding convergence rates for the selected unimodal functions, which reflect the distinguished objective of the evolutionary process applied to these functions, stated in [49], and generates optimal solutions for f 1 and f 2. Although the entropy-driven approach does not generate optimal solutions for f 4, f 6, and f 7, this approach still discovers relatively close best fitness values – the reason might be that the entropy threshold (0.5) is too high to experiment with f 4, f 6, and f 7 using the entropy-driven approach. The high threshold decreases the amount of time that exploration dominates (as shown in Figs 6 (a) and (b)). The fixed threshold should be improved to reflect function-specific characters in the future. 4.2. Multimodal functions with many local minima This subsection presents three multimodal functions (i.e., f 9, f 10, and f 11) with many local minima. As shown in the following paragraphs, the entropy-driven approach performs well in balancing exploration and exploitation explicitly as well as finding optimal solutions over many local minima even though the con-
Galley Proof
9/11/2009; 17:25
10
File: kes184.tex; BOKCTP/llx p. 15
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms 1.00E+26
2
R
1.00E+23
1.00E+20 1.6
1.00E+17
E
1.00E+14
1.2
1.00E+11
P 1.00E+08
1.00E+05
C
0.8
1.00E+02
pm
1.00E-01
0.4
1.00E-04
S, D
G
1.00E-07
B, A, W
1.00E-10
0
0
500
1000
1500
2000 2
1.00E+26
1.00E+23
R
1.00E+20 1.6 1.00E+17
1.00E+14
E 1.2
1.00E+11
1.00E+08
1.00E+05
0.8
pm C
1.00E+02
1.00E-01 0.4 1.00E-04
1.00E-07
B, A, W
S, D
G 1.00E-10 0
P
500
1000
Fig. 3. Schwefel’s Problem 1.2 (f 2) – part I.
1500
0 2000
Galley Proof
9/11/2009; 17:25
File: kes184.tex; BOKCTP/llx p. 16
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms 1.00E+26
11 2
R
1.00E+23
P 1.00E+20 1.6
E
1.00E+17
1.00E+14 1.2
1.00E+11
1.00E+08 1.00E+05
0.8
C
1.00E+02
pm
1.00E-01
0.4 1.00E-04
G
S,D
1.00E-07
B, A, W
1.00E-10 0
500
1000
1500
1.00E+26
0 2000 2
R
1.00E+23
1.00E+20 1.6
1.00E+17
E
1.00E+14
1.2
1.00E+11
P
1.00E+08
1.00E+05
0.8
C
1.00E+02
pm 1.00E-01 0.4
1.00E-04
1.00E-07
S, D
G
B, A, W
1.00E-10
0
0
500
1000
Fig. 4. Schwefel’s Problem 1.2 (f 2) – part II.
1500
2000
Galley Proof
9/11/2009; 17:25
12
File: kes184.tex; BOKCTP/llx p. 17
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms 2
1.00E+03
P, R
1.00E+02
1.00E+01
1.6
E 1.00E+00
D
pm 1.00E-01
1.2
W 1.00E-02
1.00E-03 0.8
B, A
C
1.00E-04
S
1.00E-05 0.4
1.00E-06
1.00E-07
G 1.00E-08 0
500
1000
1500
2000
2500
0 3000 2
1.00E+03
R
1.00E+02
B, A, W 1.00E+01
1.6
D
E
1.00E+00
1.00E-01 1.2 1.00E-02
S
1.00E-03 0.8
pm
1.00E-04
C
P
1.00E-05 0.4
1.00E-06
1.00E-07
G
1.00E-08 0
500
1000
1500
2000
Fig. 5. The Noise Quartic Function (f 7) – part I.
2500
0 3000
Galley Proof
9/11/2009; 17:25
File: kes184.tex; BOKCTP/llx p. 18
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms 1.00E+03
13 2
E
P, R
1.00E+02
1.00E+01
1.6
1.00E+00
B, A ,W
1.00E-01
1.2 1.00E-02
D S
1.00E-03
0.8 1.00E-04
C
pm G
1.00E-05
0.4
1.00E-06
1.00E-07
1.00E-08 0
500
1000
1500
2000
2500
1.00E+03
0 3000
2
P, R
1.00E+02
1.00E+01
E
1.6
1.00E+00
B, A, W 1.00E-01 1.2 1.00E-02
D
1.00E-03
0.8
C
1.00E-04
pm
S
1.00E-05
0.4
1.00E-06
G
1.00E-07
1.00E-08 0
500
1000
1500
2000
Fig. 6. The Noise Quartic Function (f 7) – part II.
2500
0 3000
Galley Proof
14
9/11/2009; 17:25
File: kes184.tex; BOKCTP/llx p. 10
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms Table 3 The statistics of f 1, f2, f4, f6, and f 7 Benchmark Function
f1 Maxgen = 1500
f2 Maxgen = 2000
f4 Maxgen = 5000
f6 Maxgen = 1500
f7 Maxgen = 3000
Approach
Mean Best
Schaffer Fogarty 1/5 Entropy FEP* CEP* Schaffer Fogarty 1/5 Entropy FEP* CEP* Schaffer Fogarty 1/5 Entropy FEP* CEP* Schaffer Fogarty 1/5 Entropy FEP* CEP* Schaffer Fogarty 1/5 Entropy FEP* CEP*
6.82E–08 6.82E–08 6.82E–08 6.82E–08 5.70E–04 2.20E–04 1.43E–04 2.79E–04 1.43E–04 1.43E–04 8.10E–03 2.60E–03 3.24E–01 4.12E+01 2.60E–01 2.77E–01 3.00E–01 2.00E+00 6.00E–02 1.44E+00 1.40E–01 6.00E–01 0.00E+00 5.78E+02 4.56E–03 1.37E–02 2.17E–02 3.06E–02 7.60E–03 1.80E–02
Standard Deviation 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.30E–04 5.90E–04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 7.70E–04 1.70E–04 9.76E–07 5.64E–15 1.33E–16 2.22E–16 5.00E–01 1.20E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.23E+03 6.45E–04 8.37E–04 1.51E–04 1.55E–04 2.60E–03 6.40E–03
Convergence Generation+ 830 527 1274 467 N/A N/A 730 453 1015 413 N/A N/A 3122 3120 1369 1250 N/A N/A 1350 1379 613 519 N/A N/A 2917 2984 2232 1771 N/A N/A
+ The convergence generation is the point that curve Best becomes flat in the figure. ∗ FEP and CEP are the experimental results of the multimodal functions with many local minima using evolution strategies in [49]. N/A: Not Available.
vergence rates are not the quickest among all the approaches. Ackley’s Function (f10) Figures 7 and 8 comprise the experimental results of f 10. For the Schaffer results in Fig. 7 (a), fitness curves are still improved significantly at the late stage due to constantly high p m . The slower decline of entropy curves and the energetic change of diversity curves in f 10 indicate that exploration power is valid to help the evolutionary process escape from poor local optima. For the Fogarty results in Fig. 7 (b), the relatively smaller exploration power defined by its deterministic formula affects the search for an optimal solution. Even though the exploration is still active, the very small pm does not provide enough exploration power to assist the evolutionary process to jump out of the local optima. Hence, the experimental results of the Fogarty approach are the worst among the four approaches. For the 1/5 success rule shown in Fig. 8 (a),
sufficient exploration power determined by the slower decline pm results in a better optimal solution at the later stage. Such a phenomenon can also be observed in the experimental results of the entropy-driven approach shown in Fig. 8 (b). The main difference between the latter two approaches is the tendency of p m towards more exploitation (the 1/5 success approach) or more exploration (the entropy-driven approach). Because the objective of multimodal functions is to find optimal solutions [49], retaining exploration power at the later stage shown in the entropy-driven approach may assist the evolutionary process to discover even better optimal solutions if f 10 has a longer generation number. Similar to f 10, the experimental results of f 9 and f 11 in [37] show the representative behaviors of the four approaches on multimodal functions with many optima. For exercising multimodal functions, Yao et al. [48] stated that the abilities of escaping from poor local opti-
Galley Proof
9/11/2009; 17:25
File: kes184.tex; BOKCTP/llx p. 19
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms
15 2
1.00E+02
R
B, A, W D
1.00E-01
1.00E-04
pm
1.6
1.2
E
S
P
1.00E-07
0.8
C
1.00E-10
0.4
1.00E-13
G 0
1.00E-16 0
500
1000
1500
2
R 1.00E+02
B, A, W 1.6
D
1.00E-01
pm 1.00E-04
1.2
E 1.00E-07 0.8
C 1.00E-10
S
P
0.4 1.00E-13
G 1.00E-16 0
500
1000
Fig. 7. The Ackley’s Function (f10) – part I.
0 1500
Galley Proof
9/11/2009; 17:25
16
File: kes184.tex; BOKCTP/llx p. 20
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms 2 1.00E+02
R
P
D
B, A, W 1.6
1.00E-01
pm
1.00E-04
S
E
1.2
1.00E-07 0.8
C 1.00E-10
0.4 1.00E-13
G 1.00E-16 0
500
0 1500
1000
2
R
1.00E+02
B, A, W D
1.00E-01
1.6
pm
1.00E-04
1.2
E 1.00E-07
S
P 0.8
C
1.00E-10
0.4 1.00E-13
G 1.00E-16 0
500
1000
Fig. 8. The Ackley’s Function (f10) – part II.
0 1500
Galley Proof
9/11/2009; 17:25
File: kes184.tex; BOKCTP/llx p. 11
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms
17
Table 4 The statistics of f9, f10, and f 11 Benchmark Function
f9 Maxgen = 5000
f10 Maxgen = 1500
f11 Maxgen = 2000
Approach
Mean Best
Schaffer Fogarty 1/5 Entropy FEP* CEP* Schaffer Fogarty 1/5 Entropy FEP* CEP* Schaffer Fogarty 1/5 Entropy FEP* CEP*
2.09E+01 4.55E+01 2.52E+01 2.40E+01 4.60E–02 8.90E+01 3.05E–01 1.81E+00 7.05E–02 3.90E–01 1.80E–02 9.20E+00 3.45E–02 5.75E–02 4.29E–02 5.36E–02 1.60E–02 8.60E–02
Standard Deviation 1.18E–04 2.01E–05 3.05E–14 1.24E–05 12.0E–02 2.31E+01 1.21E–03 3.48E–09 1.06E–03 1.51E–02 2.10E–03 2.80E+00 2.65E–10 6.89E–17 5.69E–17 5.84E–17 2.20E–02 1.20E–01
Convergence Generation+ 3988 4079 1265 3023 N/A N/A 1492 1010 1500 1492 N/A N/A 1089 1264 1034 876 N/A N/A
+ The convergence generation is the point that curve Best becomes flat in the figure. ∗ FEP and CEP are the experimental results of the multimodal functions with many local minima using evolution strategies in [49]. N/A: Not Available.
ma and locating a good near-global optimum determine the significance of an evolutionary process. Our results summarize that the Fogarty approach is not appropriate for experimenting with multimodal functions. It is because multimodal functions still require sufficient exploration power to escape from poor local optima at the later stage. However, the original and revised Fogarty formulae reduce mutation steps to advocate more exploitation. Figure 7 (b), the figures in [37], and Table 4 all show such an observation. The adaptive approaches (i.e., 1/5 success rule and entropy-driven) for f 10, and f 11 generate good results in terms of both finding optimal solutions and escaping from poor local optima at the later stage. Additionally, although standard deviation and Euclidean distance can be used as measures to tackle the when topic in the entropy-driven approach shown in Fig. 7 (b), the drastic changes in curves S and D make the decision process much harder to interpret whether more exploration or exploitation are needed. 4.3. Multimodal functions with only a few local minima This subsection presents four multimodal functions (f 14, f 16, f 17, and f 20) with only a few local minima. The experimental results show that the entropy-driven approach discovers better optimal solutions among a few local minima.
Shekel’s Foxholes Function (f14): For f 14, there are only a few local minima and a small maximum generation number. The evolutionary process cannot guarantee to discover all of the local optima using the Schaffer approach. In Fig. 9 (a), diversity curves appearing again after generation 50 shows that a few local optima are found in this phase. Fortunately, the evolutionary process still possesses enough exploration power to improve the value of Mean Best 1 . Similar to the previous experimental results, the Fogarty approach for f 14 also generates small refinements for the Mean Best value at the late stage (shown in Fig. 9 (b)). However, the slightly different results between the two approaches may be derived from the early decreasing p m in the Fogarty approach. Figure 10 (a) expresses that pm is decreasing throughout the evolutionary process. Such a characteristic shows that the success mutation ratio is always below an ideal value, 0.2. Therefore, the entire process inclines towards exploitation. After generation 30, there is no explicit exploration activity. Two possible conclusions to this feature are (1) during the exploration phase, all local minima are found; and (2) the p m value is too small to explore a few local optima out of a large search space.
1 Mean Best value decreases from 2.68E+00 to 2.59E+00. Please refer to [37] for further details.
Galley Proof
9/11/2009; 17:25
18
File: kes184.tex; BOKCTP/llx p. 21
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms
R
2
1.00E+03
A, W
B
1.6
1.00E+00
pm 1.2 1.00E-03
R
E
D 0.8
1.00E-06
C
S
0.4
1.00E-09
P 0
1.00E-12
G 1.00E-15
-0.4 0
20
40
60
80
100
2
1.00E+03
A, W
B
1.6 1.00E+00
D
pm 1.2
1.00E-03
R
1.00E-06
0.8
E R
C
1.00E-09
S
0.4 1.00E-12
P 1.00E-15
0 0
G
20
40
60
Fig. 9. The Shekel’s Foxholes Function (f14) – part I.
80
100
Galley Proof
9/11/2009; 17:25
File: kes184.tex; BOKCTP/llx p. 22
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms
19 2
1.00E+03
A, W B
1.6
1.00E+00
pm 1.00E-03
1.2
D
R
1.00E-06
0.8
E
1.00E-09
S 0.4
1.00E-12
C
P 0
1.00E-15
G
0
20
40
60
80
100
2
1.00E+03
A, W
B
1.6 1.00E+00
1.00E-03
1.2
D
pm S
E
1.00E-06
0.8
R
1.00E-09
0.4
C
1.00E-12
P 1.00E-15
0 0
G
20
40
60
Fig. 10. The Shekel’s Foxholes Function (f14) – part II.
80
100
Galley Proof
20
9/11/2009; 17:25
File: kes184.tex; BOKCTP/llx p. 12
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms Table 5 The statistics of f14, f16, f17, and f 20 Benchmark Function
f14 Maxgen = 100
f16 Maxgen = 100
f17 Maxgen = 100
f20 Maxgen = 200
Approach Schaffer Fogarty 1/5 Entropy FEP* CEP* Schaffer Fogarty 1/5 Entropy FEP* CEP* Schaffer Fogarty 1/5 Entropy FEP* CEP* Schaffer Fogarty 1/5 Entropy FEP* CEP*
Mean Best 2.59E+00 2.88E+00 2.33E+00 1.10E+00 1.22E+00 1.66E+00 −1.02E+00 −1.02E+00 −1.02E+00 −1.02E+00 −1.03E+00 −1.03E+00 4.21E–01 4.32E–01 4.34E–01 3.99E–01 3.98E–01 3.98E–01 −3.22E+00 −3.24E+00 −3.24E+00 −3.25E+00 −3.27E+00 −3.28E+00
Standard Deviation 2.51E–15 2.45E–15 2.30E–15 3.06E–8 5.60E–01 1.19E+00 5.87E–05 5.62E–08 1.16E–15 7.66E–04 4.90E–07 4.90E–07 1.17E–09 7.58E–16 7.51E–16 1.29E–03 1.50E–07 1.50E–07 5.61E–05 3.50E–10 3.11E–15 2.84E–04 5.90E–02 5.80E–02
Convergence Generation+ 15 16 22 72 N/A N/A 10 10 12 97 N/A N/A 9 9 13 98 N/A N/A 24 45 32 100 N/A N/A
+ The convergence generation is the point that curve Best becomes flat in the figure. ∗ FEP and CEP are the experimental results of the multimodal functions with many local minima using evolution strategies in [49]. N/A: Not Available.
Although f 14 has a few local maxima, the entropydriven approach still performs a good balance between exploration and exploitation and finding even better solutions at the end of the evolutionary process. Figure 10 (b) presents the similar characteristics (i.e., rising p m , drastically changing entropy curves, and decreasing fitness value curves) which happened in the figures of f 16, f 17, and f 20 in [37]. The experimental results in Figs 9 and 10, the figures in [37], and Table 5 prove that the entropy-driven approach outperforms or equals all the other approaches for the multimodal functions with a few local minima. The results also encore the objective of EAs for multimodal functions stated in [49] (i.e., finding even better optimal solutions). Even though the Schaffer, Fogarty, and 1/5 success rule approaches may discover local optima regions during evolutionary processes (e.g., f 14 and f 17), the exploration power required at the later stage is not powerful enough to escape from local optima and reallocate to a better solution. Even worse, the early decline pm may reduce the possibility to find out a few local optima and the optimum among them (e.g., f 16 and f 20 in [37]).
5. Conclusion In this paper, a novel entropy-driven approach is introduced where entropy acts as a succinct measure of diversity suitable to control balance between exploration and exploitation. Such a balance is fulfilled by the synergy of p m and one type of entropy to respectively tackle the how and when problems adaptively. Based on a large number of experimental results of twelve benchmark functions, our approach outperforms or equals the Schaffer, Fogarty, and 1/5 success rule approaches regarding fitness results and/or convergence rates. This paper also presents four new types of entropy: linear, Gaussian, fitness proportional, and clustering. The experimental results also show that linear entropy and Rosca’s entropy are the most appropriate ones for the entropy-driven approach for most of the functions except f 7. The problem of f 7 is derived from the nature of the noise quartic function and is further studied in [30]. Although the entropy-driven experimental results for some benchmark functions did not outperform others, the results are close and promising and worth fur-
Galley Proof
9/11/2009; 17:25
File: kes184.tex; BOKCTP/llx p. 13
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms
ther study. The current entropy-driven exploration and exploitation approach utilizes entropy as a diversity measure to adjust mutation rate in a simple metaprogramming fashion. Expending such an approach to apply other processes/operators (e.g., selection, mutation, crossover, population resizing) and experimenting with more kinds of entropy are our future plans. Experimenting with NK fitness landscapes [1] and tunable fitness landscapes [39] using PPC EA along with a tunable landscape generator [16] to explore the interdependencies between epistasis and local optima is also our plan. Additionally, because of the metaprogramming nature of PPCEA , the entropy-driven approach can be easily written in various flexible fashions (e.g., advancing the parameter tuning style of the entropy boundary value, i.e., lines 11 and 14 in Fig. 2) to more adaptive ways to improve the optimal solutions of f 4, f 6, and f 7. Hence, another goal for this paper is to initiate the entropy-driven study over evolutionary algorithms and, to promote further research on this topic in the evolutionary computation community. Lastly, there are various kinds of approaches to deal with exploration and exploitation problems to obtain optimal solutions and/or faster convergence rates. To our best knowledge, none of the current approaches provide a dominant means to address the optimal solution and convergence rate for both unimodal and multimodal functions. The experimental results show that the entropy-driven approach generates equally good results that reflect the goals stated in [49] for both unimodal and multimodal functions.
[7]
[8]
[9]
[10] [11]
[12]
[13] [14]
[15]
[16]
[17]
[18] [19]
[20] [21]
References [1] [2]
[3]
[4] [5]
[6]
L. Altenberg, N.K. Fitness Landscapes, Handbook of Evolutionary Computation, Institute of Physics Press, 1996. H.B. Amor and A. Rettinger, Intelligent Exploration for Genetic Algorithms: Using Self-Organizing Maps in Evolutionary Computation, Genetic and Evolutionary Computation Conf., pp. 1531–1538, 2005. T. B¨ack, Selective Pressure in Evolutionary Algorithms: A Characterization of Selection Mechanisms, 1st Conf. on Evolutionary Computing, pp. 57–62, 1994. T. B¨ack, D.B. Fogel and Z. Michalewicz, Handbook of Evolutionary Computation, University of Oxford Press, 1996. T. B¨ack and H.-P. Schwefel, Evolution Strategies I: Variants and Their Computational Implementation, Genetic Algorithms in Engineering and Computer Science, John Wiley & Sons, 1995. J. Brest, S. Greiner, B. Boskovic, M. Mernik and V. Zumer, Self-Adapting Control Parameters in Differential Evolution: A Comparative Study on Numerical Benchmark Problems, IEEE Trans on Evolutionary Computations 10(6) (2006), 646– 657.
[22]
[23]
[24]
[25] [26]
[27]
[28]
21
E. Burke, S. Gustafson, G. Kendall and N. Krasnogor, Advanced Population Diversity Measures in Genetic Programming, Parallel Problem Solving from Nature – PPSN VII, Springer Verlag LNCS 2439: 341-350, 2002. N. Chaiyaratana, T. Piroonratana and N. Sangkawelert, Effects of Diversity Control in Single-Objective and Multi-Objective Genetic Algorithms, Journal of Heuristics 13 (2007), 1–34. K. De Jong, The Analysis of the Behavior of a Class of Genetic Adaptive Systems, Ph.D. thesis, Dept. of Computer Science, University of Michigan, Ann Arbor, 1975. D. Dumitrescu, B. Lazzerini, L.C. Jain and A. Dumitrescu, Evolutionary Computation, CRC Press, 2007. A. Eiben, R. Hinterding and Z. Michalewicz, Parameter Control in Evolutionary Algorithms, IEEE Trans on Evolutionary Computation 3(2) (1999), 124–141. A. Eiben, E. Marchiori and V.A. Valko, Evolutionary Algorithms with on-the-fly Population Size Adjustment, Parallel Problem Solving from Nature – PPSN VIII, Springer Verlag LNCS 3242: 41–50, 2004. A. Eiben and C. Schippers, On Evolutionary Exploration and Exploitation, Fundamenta Informaticae 35 (1998), 35–50. L. Eschelman and J. Schaffer, Preventing Premature Convergence in Genetic Algorithms by Preventing Incest, 4th Intl. Conf. on Genetic Algorithms, pp. 115–122, 1991. T.C. Fogarty, Varying the Probability of Mutation in the Genetic Algorithm, 3rd Intl. Conf. on Genetic Algorithms, pp. 104– 109, 1989. M. Gallagher and B. Yuan, A General-Purpose Tunable Landscape Generator, IEEE Trans on Evolutionary Computation 10(5) (2006), 590–602. K.S. Goh, A. Lim and B. Rodrigues, Sexual Selection for Genetic Algorithms, Artificial Intelligence Review 19 (2003), 123–152. D.A. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, 1989. J.J. Grefenstette, Optimization of Control Parameters for Genetic Algorithms, IEEE Trans on Systems, Man & Cybernetics, SMC-16(1) (1986), 122–128. T. Gr¨uninger, Multimodal Optimization using Genetic Algorithms, Master Thesis, Stuttgart University, 1996. J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001. G. Harik, Finding Multimodal Solutions Using Restricted Tournament Selection, 6th Intl. Conf. on Genetic Algorithms, pp. 24–31, 1995. G. Harik and F. Lobo, A Parameter-less Genetic Algorithm. Technical Report IlliGAL 9900, Illinois Genetic Algorithms Laboratory, University of Illinois at Urbana-Champaign, 1999. T.-Y. Huang and Y.-Y, Chen, Diversity-based Selection Pooling Scheme in Evolution Strategies, 16th ACM Symposium on Applied Computing, pp. 351–355, 2001. T. Kohonen, M. Kohonen, R. Schroeder and T.S. Huang, SelfOrganizing Maps, Springer-Verlag, 2001. C.-Y. Lee, Entropy-Boltzmann Selection in the Genetic Algorithms, IEEE Trans on Systems Man and Cybernetics 33(1) (2003), 138–142. S.-H. Liu, QoSPL: A Quality of Service-Driven Software Product Line Engineering Framework for Design and Analysis of Component-Based Distributed Real-Time and Embedded Systems, Ph.D. thesis, Dept. of Computer and Information Sciences, University of Alabama at Birmingham, 2007. S.-H. Liu, M. Mernik and B.R. Bryant, Parameter Control in Evolutionary Algorithms by Domain-Specific Scripting Lan-
Galley Proof
22
[29]
[30]
[31]
[32]
[33]
[34]
[35] [36]
[37] [38]
9/11/2009; 17:25
File: kes184.tex; BOKCTP/llx p. 14
S.-H. Liu et al. / To explore or to exploit: An entropy-driven approach for evolutionary algorithms guage PPCEA , 1st Intl. Conf. on Bioinspired Optimization Methods and their Applications, pp. 41–50, 2004. S.-H. Liu, M. Mernik and B.R. Bryant, Entropy-driven Parameter Control for Evolutionary Algorithms. Informatica: An Intl, Journal of Computing and Informatics 31(1) (2007), 41– 50. S.-H. Liu, M. Mernik and B.R. Bryant, A Clustering Entropy-driven Approach for Exploring and Exploiting Noisy Functions, 22nd ACM Symposium on Applied Computing, pp. 738–742, 2007. M. Lozano, F. Herrera and J.R. Cano, Replacement Strategies to Maintain Useful Diversity in Steady-State Genetic Algorithms, Soft Computing: Methodologies and Applications (2005), 85–96. W. Lu and I. Traor´e, A New Evolutionary Algorithm for Determining the Optimal Number of Clusters, 17th Intl. Conf. on Tools with Artificial Intelligence, pp. 712–713, 2005. W.N. Martin, J. Lienig and J.P. Cohoon, Island (Migration) Models: Evolutionary Algorithms based on Punctuated Equilibria, in: Evolutionary Computation 2, Chapter 15, T. B¨ack et al., eds, Institute of Physics Publishing, Bristol, UK, 2000. M. Mernik, J. Heering and A. Sloane, When and How to Develop Domain-Specific Languages, ACM Computing Surveys 37(4) (2005), 316–344. Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag, 1996. N. Mori, J. Yoshida, H. Tamaki and H.K. Nishikawa, A Thermodynamical Selection Rule for the Genetic Algorithm, 2nd Intl. Conf. on Evolutionary Computation, pp. 188–192, 1995. PPCEA: A Domain-Specific Language for Evolutionary Algorithms http://www.cis.uab.edu/liush/PPCea.htm. W.H. Press, S.A. Teukolsky, W.T. Vetterling and B.P. Flannery, Numerical Recipes: The Art of Scientific Computing,
[39]
[40] [41]
[42]
[43] [44]
[45]
[46]
[47]
[48] [49]
Cambridge University Press, 2007. C.R. Reeves, Experiments with Tunable Fitness Landscapes, Parallel Problem Solving from Nature – PPSN VII, Springer Verlag LNCS 1917, 139–148, 2000. E. Ronald, When Selection Meets Seduction, 6th Intl. Conf. on Genetic Algorithms, pp. 167–173, 1995. J. Rosca, Entropy-Driven Adaptive Representation, Workshop on Genetic Programming: From Theory to Real-World Applications, pp. 23–32, 1995. J.D. Schaffer, R.A. Caruana, L.J. Eshelman and R. Das, A Study of Control Parameters Affecting Online Performance of Genetic Algorithms for Function Optimization, 3rd Intl. Conf. on Genetic Algorithms, pp. 51–60, 1989. C. Shannon, A Mathematical Theory of Communication, Bell Systems Technical Journal 27 (1958), 379–423, 623–656. H. Shimodaira, A Diversity Control Oriented Genetic Algorithm, 2nd Intl. Conf. on Genetic Algorithms in Engineering Systems: Innovations and Applications, pp. 444–449, 1997. R.E. Smith, Adaptively Resizing Population: Algorithm and Analysis. Technical Report TCGA Report #93001, University of Alabama, 1993. J.E. Smith and T.C. Fogarty, Operator and Parameter Adaptation in Genetic Algorithms, Soft Computing 1(2) (1997), 81–87. R. Ursem, Diversity-Guided Evolutionary Algorithms. Parallel Problem Solving from Nature – PPSN VII, Springer Verlag LNCS 2439 (2002), 462–471. Wikipedia, http://en.wikipedia.org/wiki/Bijection%2C injection and surjection, 2008. X. Yao, Y. Liu and G. Lin, Evolutionary Programming Made Faster, IEEE Trans on Evolutionary Computation 3(2) (1999), 82–102.