Function Optimisation using Multiple-Base Population Based Incremental Learning M.P. Servais, G. de Jager, J.R. Greene
Abstract |Population Based Incremental Learning (PBIL) is a stochastic search technique which combines characteristics of both the Genetic Algorithm and competitive learning. It has been shown to be a simple, yet widely eective function optimisation strategy. Applications of PBIL to date include: selecting the weights in a neural network, image registration, lter design and feature set extraction. A generalisation of PBIL to Base-N PBIL is presented and discussed. Following this, a Multiple Base version of PBIL (MB-PBIL) is presented, which attempts to optimise a function by searching in a variety of bases. (Similar strategies have been shown to be eective for hill-climbers.) Finally, it is shown that for the standard test functions used, MB-PBIL performed better than the standard PBIL.
P
I. Introduction
is the number of bits per dimension used to represent a particular point in the space. Thus represents the precision per dimension (in bits.) A probability vector (PV) in PBIL is expressed as a realvalued vector with elements in the range [0,1]. A PV also has x elements (from which population members are generated). An alternative (but equivalent) way of expressing a PV is to list (for each element of the PV) the separate probabilities of selecting a 0 or 1 (which will naturally sum to unity for each element.) An example of a (Base 2) PV element expressed in this way are illustrated in Figure 1 (a).
P
P
D
P
OPULATION Based Incremental Learning (PBIL) was introduced by Baluja in 1994 [1]. It is a function optimisation technique which combines characteristics of both the Genetic Algorithm (GA) and competitive learning. The PBIL algorithm uses a probability vector which, when sampled, reveals relatively high quality solution vectors with high probability. The solution vectors which are generated by sampling the probability vector, are expressed as binary strings. These binary strings are then mapped to the domain over which the function is de ned. It has been shown that randomly re-mapping space via base changes provides a simple means of applying multiple search strategies to a given search problem, and that this oers a pragmatic means of probing a cost function from many views. [2] Kingdon and Dekker applied such a re-mapping of space to hill climbers. [2] They showed that hill-climbers using base changes performed signi cantly better than various implementations of the GA, over several functions. This article describes a similar re-mapping of space, applied to a generalised (i.e. non-binary) version of PBIL. First, PBIL is extended to include non-binary implementations. Following this, a multiple base implementation of PBIL is discussed, which allows for transition between bases. The performance of multiple base PBIL relative to standard PBIL is also discussed. II. Extending PBIL to Base-N
A. Representation of the Probability Vector A population member in PBIL is expressed as a binary string. It consists of D x P bits, where D is the number of Fig. 1. A representation of some typical elements of Probability dimensions in the input space (the function domain) and Vectors: (a) Base 2 (with a relatively uniform pdf); (b) Base 5 (with a pdf which is close to converging on n = 1); (c) Base 8. Department of Electrical Engineering, University of Cape Town. E-mail:
[email protected]
Note that the probabilities in each element sum to unity.
In general, a Probability Vector need not be expressed in binary format. Thus a PV expressed in base N will generate elements of integers in the range [0,N-1]. Some typical non-binary examples of PV elements are illustrated in Figure 1 (b) and (c). An important point worth noting is that a base N probability vector with elements per dimension can represent P distinct values per dimension. Consequently, a precision of bits per dimension requires a Base-N probability vector to consist of 2( ) ( ) elements. 1 P
N
P
log
N
D
P
B. Learning One essential characteristic of PBIL is the method of learning, based on the ttest member of each population. [1] This method cannot be directly extended to Base-N. However, a general learning method was devised which is similar to that used in standard PBIL. The learning procedure used is described below: Consider one element, E of a Base-N probability vector. Then E consists of a set of probabilities, p(n) with 0 n N ? 1. p(n) indicates the probability of the PV generating the value n for element E of a population member in the next generation. 2 Assume that the ttest member of the latest population had a value of k as the E'th element. (0 k N ? 1) Then the probability of generating a value of k as the Eth element of a population member in the next generation should increase. In general, the probability of generating values close to k (modulo N) should increase slightly; while the probability of generating values relatively far from k (modulo N) should decrease slightly. This can be expressed quantitatively according to the following rule: For each p(n) 2 E , modify p(n) by:
( ) = ( ) (1 ?
p n
p n
LR
N
)+ 4
LR N
N
n
j ? j? 2 k
N
(1)
where the Learning Rate, N is a real number in the range [0,1] such that ( ) increases by a factor of (1+ N ) in one generation. Following this, each ( ) should be normalised to ensure that the sum of the probabilities for each element of the PV sum to unity. The learning described in Equation 1 is illustrated below with an example. Figure 2 shows the eect of learning on element E of a Base-8 PV. The value of the N parameter is further discussed in Section II-D, where it is shown that the learning rate should be a function of the base, . LR
p k
LR
p n
LR
N
Fig. 2. The eect of learning on element E of a Base 8 probability C. Mutation vector (assuming a Learning Rate, LRN of 0.3). (a) Initially, all values of p(n) are equal. (b) After the members of the rst Another important characteristic of PBIL is mutation. generation have been evaluated, E is updated according to the [1] Mutation helps to prevent premature convergence of ttest member (which is assumed to have a value of k = 1). the PV. This in turn reduces the likelihood of a local min(c) After another nine generations in which the ttest member is assumed to have had k = 1 (in each generation), E can be seen to imum/maximum being being selected as the optimal solube converging towards n = 1. (d) After another four generations tion. in which the ttest member is assumed to have had k = 3 (in 1 D is the number of dimensions. each generation), E can be seen to be less converged. Note that PN?1 after these 14 generations, the probability of E generating a value 2 Note that for each PV element E, n=0 p(n) = 1 of n = 5 is negligible.
In Base-N PBIL, mutation can be implemented fairly simply, as described below: Consider one element, E of a Base-N probability vector, as described in Section II-B above. Once E has been updated according to the learning procedure described above, add a constant to each ( ) 2 (0 ? 1). This constant is referred to as the mutation rate, N. P ?1 Following this, normalise E so that N n=0 ( ) = 1 The mutation described above is illustrated with an example. Figure 3 shows the eect of learning and mutation on element E of a Base-8 PV. The relationship between N and N is further discussed in Section II-D. p n
E;
n
N
MR
p n
LR
MR
D. Standardising Learning and Mutation Rates for dierent bases In standard PBIL, the convergence of a PV is measured by calculating the extent to which the PV elements have shifted toward either 0 or 1. An equivalent way of determining convergence in Base-N PBIL is to consider the maximum probability value in each element of the probability vector. Then the convergence of the PV corresponds to the average of the maximum values of the PV elements tending to unity. This section describes how the above concept of convergence can be used to standardise the learning and mutation rates across dierent bases. As before, consider one element, E of a Base-N probability vector. Then E consists of a set of probabilities, p(n) with 0 n N ? 1. De ne the convergence value, CN (g) as the maximum value of p(n) within element E after g generations. This convergence value is useful in standardising both LRN and M RN .
D.1 The Learning Rate, N In order to standardise the learning rate over a range of bases, it is rst necessary to make a number of assumptions: There is no mutation (i.e. N = 0) During learning, there is no normalisation of PV elements (to ensure that probabilities sum to unity.) This is not logically reasonable. However, in practice the deviation from unity is insigni cant for relatively few generations. The same population member is the ttest over the rst 0 generations. This is not characteristic of the normal performance of PBIL, however, this assumption is made purely because the cumulative eect of the learning process is under consideration. Subject to the above assumptions, the learning rate can be standardised in the following sense: Choose N (as a Fig. 3. The eect of learning and mutation on element E of a Base 8 function of ) such that after exactly 0 generations, probability vector (assuming a Learning Rate (LRN ) of 0.3 and a Mutation Rate (MRN ) of 0.01). (a) Initially, all values of p(n) are equal. (b) After the members of the rst generation have (2) N ( 0) = 1 LR
MR
g
LR
N
g
C
g
for all bases, . This implies that convergence occurs in all bases during generation 0 . Equivalently, this may be expressed as: 1 (1 + g (3) N) 0 = 1 N
g
N
LR
been evaluated, E is updated according to the ttest member (which is assumed to have a value of k = 1). (c) After another nine generations in which the ttest member is assumed to have had k = 1 (in each generation), E can be seen to be converging towards n = 1. (d) After another four generations in which the ttest member is assumed to have had k = 3 (in each generation), E can be seen to be less converged. Note that after these 14 generations, the probability of E generating a value of n = 5 is small, but not negligible (as was the case without mutation.)
From Equation 3 it is possible to derive N as a function behind MB-PBIL is to search a tness function using Baseof the learning rate for Base-2 ( 2 ) as follows: N PBIL in a particular base; and then to change to another base once the search in the original base has fails to imlog2 (1+LR2 ) ? 1 (4) prove on the best result (within a certain number of generN= ations.) The process is repeated until speci ed criteria are D.2 The Mutation Rate, N reached. The pseudo-code implementation of MB-PBIL is The mutation rate may be standardised in a similar way listed below. 4 across a range of bases, according to the following steps: Following the rst generation, and the application of A. The MB-PBIL Algorithm learning and mutation to the PV, the convergence value FITTEST VAL = ?1 de ned above may be calculated to be given by: GEN = 5 LR
LR
LR
N
MR
C
N (1) =
LRN N
+
1+
N 1 + (N M RN )
(5)
MR
Solving for the Mutation Rate (M RN ) yields: MR
N=
LRN N
1+ N
C
?
C
N (1)
(6)
N (1) ? 1
Recall that a Base-N PV with
elements per dimension can represent P distinct values per dimension. Thus to represent 16 distinct values requires: one Base-16 PV element; or two Base-4 PV elements; or four Base-2 PV elements. Then, in order to ensure that learning and convergence proceed at equivalent rates in any of the three bases, it is required that (after one generation) 16 (1) = ( 4 (1))2 = ( 2 (1))4 . In general, in order to ensure that the combined eect of learning and mutation is standardised for any base, N (1) can be calculated relative to the desired value of 2 (1) as: P
N
C
C
C
C
C
C
N (1) =
C2
(1)log2 (N )
(7)
Thus, given a learning rate for Base 2, it is possible to determine an equivalent learning rate for Base-N using Equation 4. Similarly given a convergence value3 for Base-2, it is possible to determine an equivalent convergence value for Base-N (using Equation 7) and hence the corresponding mutation rate for Base-N (using Equation 6.) Table I gives some typical values for learning and mutation rates over a variety of bases.
% First, search in each base for 5 generations to find a good starting base. for N = START BASE to END BASE % Search fitness function for 5 generations. [FITNESS,PV] = BaseN PBIL(N, GEN) if (FITNESS > FITTEST VALUE), FITTEST VAL = FITNESS FITTEST VAL PV = PV BEST N = N end end % Then continue searching, starting from the best value found so far. If a particular base does not lead to a better fitness value within 20 generations, then randomly change the base. N = BEST N GEN = 20 do while (stopping criteria not satisfied) [FITNESS,PV] = BaseN PBIL(N, GEN, FITTEST VAL PV) if (FITNESS > FITTEST VAL), FITTEST VAL = FITNESS FITTEST VAL PV = PV BEST N = N else N = random(START BASE,END BASE) end loop
TABLE I
B. Conversion of the Probability Vector One aspect of Multiple Base PBIL that requires further clari cation is the mapping of a PV in Base N1 to a PV in Base N2 . The most accurate method would be to calculate Base (N ) LRN CN (1) M RN the probabilities of each possible combination of elements. 2 0.200 0.595 0.026 This would be a time consuming process, and still require 5 0.527 0.300 0.011 some approximations (since a PV in Base N1 may span the 8 0.728 0.211 0.007 search space with less precision than a PV in Base N2 .) Because of these diculties, it was decided to use a simpler (though slightly less accurate) approach. When mapIII. Implementation of Multiple Base PBIL ping from a PV in Base N1 to a PV in Base N2 , the popuThis section describes an implementation of Multiple lation member with the greatest probability of being genBase PBIL (MB-PBIL) based on the characteristics of erated by the PV was rst calculated. This was converted Base-N PBIL described in Section III. The essential idea to Base N2 , yielding a totally converged PV. Typical equivalent Learning (LRN ) and Mutation (MRN ) rates.
3
e.g. the convergence value after one generation
4
It is assumed that maximisation of the tness function is required
Naturally, this (converged) PV could only generate one value. Thus the PV elements were mutated slightly by a value inversely proportional to the Base, N. C. Tests Performed PBIL and MB-PBIL were implemented in MATLAB. The performance of MB-PBIL relative to standard PBIL was tested using three standard optimisation functions: Bump 20: A 20-dimensional function de ned by Keane which was designed to reproduce some of the features of practical engineering design optimisation problems. [3] (Tested using PBIL and MB-PBIL, each with 20 trials of 50,000 function evaluations) Michaelwicz's function: a 10-dimensional function which was chosen as one of the test functions for the First International Contest on Evolutionary Optimisation, ICEO. [4](Tested using PBIL and MB-PBIL, each with 10 trials of 100,000 function evaluations) Shekel's foxholes: a 10-dimensional function, also from the ICEO. This function exhibits a large number of deceptive local minima.(Tested using PBIL and MB-PBIL, each with 10 trials of 50,000 function evaluations) Bump 20 was also tested more extensively by selecting various population sizes and dierent ranges of base in which to search. Throughout all tests, a Learning Rate of 0.1 and a Mutation Rate of 0.01 was used for PBIL.5 Similarly, for MB-PBIL, 2 was xed at 0.2 and 2 at 0.026. Where not otherwise speci ed, a population size of 100 was used for PBIL, and a population size of 50 for MB-PBIL.
TABLE II
The Performance of PBIL and MB-PBIL on the three test functions
Function Method Average Max. Best Max. (max./min.)a Fitness Fitness Bump PBIL 0.7437 0.7659 (max.) MB-PBIL 0.7698 0.7857 Michal. PBIL -9.2024 -9.4066 (min.) MB-PBIL -9.6419 -9.6601 Shek. Fox. PBIL -7.7768 -13.2415 (min.) MB-PBIL -8.2750b -14.1904 a Whether the function is required to be minimised or maximised. b Although the average maximum tness over the ten trials was 8.27, six of the ten trials only reached tness values in the range -4.1 to -4.8; while the remaining four trials achieved tness values less than -14.1
search near the best value found if no better value is found within 20 generations. Table IV shows that for population sizes of 20 and 50, using a wide range of bases (2 to 16) resulted in the best results. For a population size of 5, bases 2 to 11 yielded the best performance. For all three population sizes, using only 4 bases (bases 2 to 5) achieved the worst results. This seems to con rm that searching the space in a larger variety of bases may help to achieve a view of the space that allows for easier optimisation of the function. In seven trials of one million evaluations each, MB-PBIL IV. Results and Discussion achieved an average maximum value of 0.8021; while the Table II compares the performance of PBIL and MB- best result from the seven trials was a value of 0.8035. PBIL on each of the three functions tested (averaged over 10 or 20 trials). It is evident that MB-PBIL performed TABLE III better than PBIL in terms of the average maximum tness The Performance of PBIL on \Bump 20" (Averaged over 20 achieved, across all of the functions. This performance was trials of 50,000 function evaluations) achieved without ne-tuning the parameters for MB-PBIL. A reasonable hypothesis for the superior performance of Population Size Max. Fitness MB-PBIL would be that dierent areas of the functions' 5 0.5732 search space exhibit a \topology" more suited to explo20 0.6939 ration in a particular base. However, this is dicult to 50 0.7381 prove since it would require a detailed analysis of the func100 0.7437 tions concerned. 200 0.7561 Of particular interest is the performance of MB-PBIL on the Shekel's foxholes function, where highly optimal values of less than -14.1 were found on four of the trials, while the remaining six trials returned mediocre results. Although TABLE IV the performance of MB-PBIL was inconsistent for this func- The Performance of MB-PBIL on \Bump 20" (Averaged over tion, it nevertheless averaged better than PBIL (which pro20 trials of 50,000 function evaluations) duced a much more even spread of tness results.) Further detailed results of the performance of PBIL and Population Size Bases MB-PBIL on \Bump 20" are given in the Tables III, IV 2-5 2-11 2-16 and V. Several points are worth noting: 5 0.7615 0.7673 0.7663 Relative to PBIL, MB-PBIL performs very well with low 6 20 0.7605 0.7644 0.7722 population sizes. This is because MB-PBIL returns to 50 0.7598 0.7634 0.7698 5 LR
6
These rates were used by Baluja. [1] as low as 5
MR
TABLE V
The Performance of MB-PBIL on \Bump 20" (Averaged over seven trials)
Function evaluations Average Max. Best Max. (in thousands) Fitness Fitness 100 0.7850 0.8020 200 0.7902 0.8029 300 0.7975 0.8030 500 0.8011 0.8032 1000 0.8021 0.8035 V. Conclusion
This paper has described a generalisation of PBIL to Base-N. Following this, a Multiple Base version of PBIL was presented, which attempts to optimise a function by searching in a variety of bases. It was shown that (on average) MB-PBIL performed better than standard PBIL on all of the functions tested. It is recommended that the conversion of a Probability Vector from one base to another be researched further, and that the performance of MB-PBIL be tested on a large selection of functions (with a variety of learning and mutation rates.) Acknowledgements
The authors wish to thank Ebrahim Jakoet, Praven Reddy, Birgit Seaman and Guy Saban (all of UCT) for providing code for the test functions. References S. Baluja, \Population based incremental learning," tech. rep., Carnegie Mellon University, 1994. L. D. Jason Kingdon, \The shape of space," tech. rep., Intelligent Systems Lab, University College London, 1995. Keane, \A brief comparison of some evolutionary optimisation methods," tech. rep., Department of Engineering Science, University of Oxford, 1995. [4] S. Langerman, \First international contest on evolutionary optimisation." http://iridia.ulb.ac.be/langerman.ICEO.html. [5] S. Baluja, \Genetic algorithms and explicit search statistics," tech. rep., Carnegie Mellon University, 1995.
[1] [2] [3]