curs in adaptive operators, specifically, selective crossover - an adaptive re- ... tion we wish to ask is - does this hitchhiker bias really pose a threat to adaptive re-.
Hitchhikers Get Around Kanta Vekaria and Chris Clack Department of Computer Science University College London Gower Street London WC1E 6BT United Kingdom Email: {K.Vekaria, C.Clack}@cs.ucl.ac.uk Abstract. Previous studies have shown that hitchhiking in standard GAs can cause premature convergence. In this study we show that hitchhiking also occurs in adaptive operators, specifically, selective crossover - an adaptive recombination operator. We compare selective crossover with uniform crossover (a highly disruptive operator) and show that although selective crossover favours hitchhikers more than uniform crossover it does not show worse performance. The constant adaptive behaviour and evolvability allows selective crossover to suppress hitchhiking and thereby yield better performance. Further experiments show that hitchhiking cannot be eliminated completely in selective crossover but can be reduced and further increase performance.
1. Introduction Traditional genetic algorithm (GA) theory has described how schemas are sampled according to their observed fitnesses by the process of selection and recombination also known as implicit parallelism (Holland 1974). However, GAs have been known to converge prematurely to locally optimum solutions. This form of behaviour may be the result of schemas being sampled at rates that are not justified by their static fitnesses. These undesirable schemas have been coupled along with desirable schemas during recombination, thereby producing an above average individual, which later gets sampled at a higher rate during selection; this may produce more instances of these undesirable schemas together with the desirable schemas. This phenomenon is known as “spurious correlations” or “hitchhiking” (Schaffer, Eshelman & Offnut (1991) and Forrest & Mitchell (1993)) where the undesirable (less fit) schemas hitchhike along with the more fit schemas. Previous studies of hitchhiking have been done for standard GAs (see Schaffer, Eshelman & Offnut (1991) and Forrest & Mitchell (1993)) but now that the field of GAs has grown tremendously many favour adaptive operators to enhance evolvability in a GA (Altenberg 1994). In this paper we show that adaptive operators also allow hitchhiking and show that it cannot be eliminated completely without a trade-off in performance. Although hitchhiking has been shown to cause premature convergence
in standard GAs it does not always have a negative impact on search if there are other biases that can suppress hitchhikers.
2. The History of Hitchhiking Schaffer, Eshelman & Offnut (1991) first discovered hitchhiking in GAs. In their work they define hitchhiking as spurious correlations that are inherent in GAs and can lead to premature convergence. They explored the effects of two-point (a less disruptive operator) and uniform crossover (a highly disruptive operator) on synthetic problems that are susceptible to hitchhiking. Two-point crossover is known to possess considerable positional bias meaning that schemas are less likely to get disrupted if they are situated together on the chromosome, therefore it is not sufficiently vigorous enough to suppress hitchhikers. However, uniform crossover possesses no positional bias and therefore will tend to disrupt highly fit schemas as well as the hitchhikers. Schaffer et al concluded that hitchhiking could not be eliminated and suggested that using a population-elitist strategy, in which individuals are introduced into the population only by replacing the worst of the population, can compensate for the disruptive behaviour of uniform crossover. This allows good solutions to be retained and reduces the high sampling rate of hitchhikers. A further study on Royal Road functions (Forrest & Mitchell, 1993) showed that the convergence of the GA was slowed down by the presence of intermediate schemas (R2) (Forrest & Mitchell, 1993). To understand the slow convergence they traced specific schemas during evolution; the results showed that the slow convergence was due to hitchhiking. They used one-point crossover in their study which a has a high positional bias and therefore is very unlikely to disrupt hitchhikers. To this day many adaptive techniques have been proposed to enhance GA capabilities; however, hitchhikers have also posed a threat to adaptive techniques. Vekaria & Clack (1999) analysed adaptive recombination operators and identified biases that are inherent in these operators. One of the key biases is the hitchhiker bias. The question we wish to ask is - does this hitchhiker bias really pose a threat to adaptive recombination operators? In the following sections we give a brief description of selective crossover (Vekaria & Clack, 1998), an adaptive recombination operator, and go on to show how hitchhiking is present in this operator. In Section 5 we compare selective crossover with uniform crossover. We also compare different versions of selective crossover and show that the hitchhiker bias can be reduced and thereby increase performance of selective crossover.
3. Selective Crossover Selective crossover (Vekaria & Clack, 1998) is an adaptive recombination operator that evolves better individuals by using a confidence vector to bias alleles during
recombination. The chromosome is accompanied by a confidence vector, which consists of continuous real values such that each gene in the chromosome has an associated confidence value. This confidence value is used to accumulate the fitness contribution of an allele with respect to the fitness of the entire individual and parental fitnesses. Hence, the confidence vector accumulates knowledge of what happened in previous generations and uses that to bias successful alleles during crossover onto the next generation. Recombination uses two parents to create two children. During recombination two parents are selected and their fitness is recorded. The confidence value of each gene, of both parents, is compared linearly across the chromosome. The allele that has a higher confidence value contributes to Child1 along with the associated confidence value. To keep diversity in the population Child2 inherits the alleles that have a lower confidence value. If both confidence values are equal then crossover does not occur at that position. Let us consider an l bit representation and let Ω = {0 ,1} be the search space. Each individual in selective crossover has a gene vector G and a confidence vector D. The fitness function is φ (G ) . l
, g ) where g ∈ {0 ,1} D = (d , , d ) where d ∈ R
G = (g1 ,
l
i
l
i
+
1
Given two parents chosen for crossover G1p and G2p with confidence vectors D1p and D2p . A crossover can be represented by inheritance masks I 1 and I 2 . I = (i1 , , il ) where ii ∈ {0 , 1}.
The inheritance mask for Child1, whose gene vector is denoted as G1c , is I1 . I 1 = ψ ( ψ ( D1 − D 2 ) + 1)
(1)
Where: if xi > 0 1, ψ (x ) = − 1, if xi < 0 0, otherwise (2) The inheritance mask for Child2 G 2c is I 2 where I 2 = 1 − I1 The child gene vectors are therefore: ′ ′ G ic = I i ′ (G1p − G 2p )+ G 2p
(3)
And the confidence vectors
′ ′ Dic = I i ′ ( D1p − D2p )+ D2p + Fi′
(4)
Where: max( 0 , φ (G c )− φ (G p ) ), i i if [φ (Gic ) > φ (G1p )]∧ [φ (Gic ) > φ (G2p )] ′ Fi = ′ max( 0 , φ (Gic )− max(φ (G1p ) , φ (G2p ) ) ) I 2 (G1p − G2p ) , otherwise (5) 1.1
Example
Fig. 1 gives an example of selective crossover: each shaded allele has a higher confidence value than its competing allele. To keep diversity in the population Child2 inherits the non-dominant alleles. Parent 1 – fitness = 0.36
Child 1 – fitness = 0.46
0.4
0.3
0.01
0.9
0.1
0.2
0.4
0.3
0.4
0.9
0.9
0.3
1
0
0
1
0
0
1
0
1
1
1
0
Parent 2 – fitness = 0.30
Child 2 – fitness = 0.20
0.01
0.2
0.4
0.2
0.9
0.3
0.01
0.2
0.01
0.2
0.1
0.2
0
1
1
1
1
0
0
1
0
1
0
0
Increase confidence values Child 1 0.4
0.3
0.4
0.9
0.9
0.3
1
0
1
1
1
0
Child 1 – fitness = 0.46 0.4
0.3
0.5
0.9
1.0
0.3
1
0
1
1
1
0
Child 2 0.01
0.2
0.01
0.2
0.1
0.2
0
1
0
1
0
0
Fig. 1. Recombination crossover
with
selective
Child 2 – fitness = 0.20 0.01
0.2
0.01
0.2
0.1
0.2
0
1
0
1
0
0
Fig. 2. Updating confidence values.
After crossover the two new children are evaluated. If a single child’s fitness is greater than the fitness of either parent, the confidence values (of those alleles that were exchanged during crossover) are increased proportionately to the fitness in-
1
crease . This is done to reflect the alleles’ contribution to the fitness increase; an example of the mechanism is shown in Fig. 2. In Fig. 2, only Child1 has an increase in fitness of 0.1 (compared with the fittest parent) hence its confidence values get updated. In Fig. 1 the bit values of Parent1 and Parent2 at loci 1 and 2 did not get exchanged during crossover and the bit values at loci 4 and 6 are the same. Thus, after selective crossover, the alleles that caused a change in the chromosome are only those held at loci 3 and 5. Since the change of those alleles at loci 3 and 5 resulted in an increase in fitness, only their confidence values get increased in Child1 (shaded in Fig. 1). On initialisation the confidence values are randomly generated, as is the population, but are restricted to be in the range [0,1]. By doing this we are allowing the GA to explore the search space by evolving the confidence values – to determine and promote those genes which are considered fit. Child2 is needed so that important information is not lost in early generations when there is more exploration than exploitation. That way if Child2 were to produce an increase in fitness to that of its parents then its genes would get promoted. Thus selection will bias the fitter individuals and lose the least fit and their confidence vectors. The confidence values get increased when the fitness increases therefore it follows that one should decrease the confidence values when fitness decreases. We choose not to do this because we prefer not to introduce a strong bias during the early (highly explorative) generations. In our example (Fig. 2) Child2 showed a fitness decrease. This child may be a prospective parent in the next generation. By not decreasing the confidence values we still allow the genes to compete with other genes (at the same locus in the population). If we were to decrease them they may never get chosen, hence introducing a strong bias and limiting exploration in early generations. Unlike one-point or two-point crossover, selective crossover is not biased against schema with high defining length (as defined by Holland (1975)). Selective crossover propagates good schema regardless of their defining length – for example, if a schema consists of interacting genes at the two extremes of the chromosome, it can be propagated as easily as a schema which consists of interacting genes located adjacent to each other. Selective crossover can be considered as an extension of uniform crossover. With selective crossover the probability of crossing over at a position is dependent on what happened in previous generations whilst in uniform crossover the probability it fixed throughout (traditionally at 0.5).
4. Biases in Selective Crossover Vekaria & Clack (1999) identified four key biases inherent in selective crossover: directional bias, credit bias, initialisation bias, and hitchhiker bias. A directional bias exists if alleles are favoured (or not favoured) for their credibility. A credit bias is the 1
As can be seen from Eq. 5, there is a special case where the child’s fitness is greater than that of both parents. In this case the fitness increase is calculated as the fitness difference between the child and its direct parent.
degree at which an allele gets favoured with respect to its credibility. An initialisation bias exists if alleles are favoured, during initialisation, without knowing their credibility. A hitchhiker bias exists if alleles get favoured when they are not the cause of a fitness increase. These four biases are linked together as shown in Fig. 3. The directional, credit and hitchhiker biases are not independent biases and each one is a direct result of another. The initialisation bias does not fall within this group as it is introduced independently into the adaptive technique. The initialisation bias is also a cause of the hitchhiker bias because alleles are being assigned a credit when they may not be fit alleles.
Directional Bias (DB)
Credit Bias (CB)
Initialisation Bias (IB)
Hitchhiker Bias (HB)
Fig. 3: Relationships between biases.
The hitchhiker bias is inherent in selective crossover because it increases the confidence values of only those alleles that were exchanged during recombination and resulted in a fitness increase in the child. The increase of the confidence values is determined by the fitness increase relative to the parents. In the case of the One Max problem where schemas containing 1’s are fitter than those containing 0’s, if a 0 is introduced in a child as well as three 1’s the fitness will increase and so will the confidence values. The confidence values, of the three 1’s and the 0 get increased by the same amount (the fitness increase). Hence in following generations the 0 will have a high confidence value and get passed down to future generations. For the One Max problem such an event is not desirable. For example in Fig. 4, the shaded confidence values show four alleles that were exchanged to create Child1 and Child2. Given the one-max problem, if Child1 has a fitness increase of 3; the 0 that was also exchanged at loci 2 will get favoured. The confidence values of all four alleles (shaded) will increase by 3. This means that the 0 at loci 2 is hitchhiking; it did not contribute to the fitness increase. For this reason, we view the hitchhiker bias as being detrimental to the evolutionary process.
Child1 – fitness = 4 1.2
0.4
0.8
3.0
1.0
0.3
1
0
1
1
1
0
0.11
0.3
0.01
1.8
0.1
0.2
0
1
0
1
0
0
Child2 – fitness = 2
Fig. 4: Hitchhiker bias.
Initialisation bias exists in selective crossover because all confidence values are assigned randomly to the genes with a constraint that they should lie in the range [0,1]. This form of initialisation can cause hitchhiking because we have already assigned a confidence to an allele without the knowledge of its true fitness contribution.
5. Experimental Details From the relationship in Fig. 3 it is clear that selective crossover is exposed to two doses of the hitchhiker bias; one from the initialisation and the other from the credit bias (this uses correlations between parental and sibling fitnesses and provides the adaptive behaviour in selective crossover). To show empirically that adaptation can suppress hitchhiking we decompose selective crossover into two algorithms by splitting the relationship in Fig. 3. We compare three variations of selective crossover: original selective crossover, selective crossover with initialisation and hitchhiker bias only, selective crossover without initialisation bias. We also compare uniform crossover (which is not adaptive and more disruptive against hitchhikers) with selective crossover (which is less disruptive against hitchhikers). The GA used in this study uses fitness proportionate selection (SUS algorithm by Baker 1989); therefore all operators in this study are also influenced by the hitchhiking described by unreasonable sampling of schemas due to selection Original selective crossover: This refers to the operator as described in (Vekaria & Clack 1998) and also summarised in Section 3. Pseudo code for the GA is given in Algorithm 1. Selective crossover with initialisation and hitchhiker bias only: In this algorithm we eliminate all other biases in selective crossover except the initialisation and hitchhiker bias (see Fig. 5). In this method confidence values do not get increased if there is a fitness increase (there is no credit bias) and therefore is not adaptive. The pseudo code of the original selective crossover GA is modified to Algorithm 2.
Initialise confidence values randomly restricting values to be in the range [0,1]. Until solution found { Fitness proportionate selection 0.01% Mutation 60% Recombination (as shown in Fig. 1) Evaluation Update confidence values (as shown in Fig. 2) } Algorithm 1: Original selective crossover.
Initialisation Bias (IB)
Hitchhiker Bias
Fig. 5: Initialisation and hitchhiker biases only in selective crossover.
Initialise confidence values randomly restricting values to be in the range [0,1]. Until population has completely converged { Fitness proportionate selection 0.01% Mutation 60% Recombination (as shown in Fig. 1) Evaluation } Algorithm 2: Initialisation and hitchhiker bias only.
Selective crossover without the initialisation bias: In this algorithm we eliminate the initialisation bias in selective crossover (Vekaria & Clack 1999) thereby reducing the effects of the hitchhiker bias. The initialisation bias was eliminated so that the only biases present in selective crossover were those shown in Fig. 6. Here, all confidence values are initialised to zero and in the first generation uniform crossover is performed on the population (uniform crossover is used only in this first generation because all confidence values are the same and therefore crossover cannot occur as shown in Fig. 1). The population is then evaluated and the confidence values were updated as shown in Fig. 2. In subsequent generations selective crossover is used as shown in Fig. 1. The pseudo code of the original selective crossover GA is modified to Algorithm 3.
Directional Bias (DB)
Credit Bias (CB)
Hitchhiker Bias (HB)
Fig. 6: Selective crossover without the initialisation bias.
Initialise confidence values to 0. Until solution found { Fitness proportionate selection 0.01% Mutation If generation == 1 { 60% Uniform crossover } 60% Recombination (as shown Fig. 1) Evaluation Update confidence values (as shown in Fig. 2) } Algorithm 3: Selective crossover without initialisation bias.
Uniform crossover: Here the GA uses uniform crossover (Syswerda 1989) with 0.5 probability of exchange at each loci. We use this as a comparison because the study carried out by Schaffer, Eshelman & Offnut (1991) showed that uniform crossover is very disruptive against hitchhiking and therefore limits premature convergence.
6. Results The results are shown in Table 1. All algorithms were applied to a Royal Road function (R2) (Forrest & Mitchell, 1993), the length of the chromosome was 64 and the population size was 128. Table 1 shows the average fitness of the best solutions found and the average evaluations taken to find the solution; these are averages taken from 50 independent
runs. The results show that the algorithm “initialisation and hitchhiker bias only” was the worst performer. In all runs it could not find the optimal solution; the average solution found was 56.00. The original selective crossover, which contained all the biases depicted in Fig. 3, found the optimal solution with 64598 evaluations on average. In comparison, when the initialisation bias was removed and thus reduced the hitchhiker bias (selective crossover without the initialisation bias) the GA found the optimal solution but in fewer evaluations than the original selective crossover; 55857 was the average number of evaluations. Uniform crossover that is very disruptive against hitchhikers performed worse than the original selective crossover although selective crossover possesses two doses of the hitchhiker bias, one from the initialisation bias and the other from the credit bias (see Fig. 3). Table 1: The effect of the hitchhiker bias on GA search. The maximum achievable fitness is shown in parentheses.
Algorithm
Uniform crossover Original selective crossover Selective crossover without initialisation bias Initialisation bias and hitchhiker bias only
Royal Road R2 (192.00) Average best solution Average number of found evaluations 192.00 74128 192.00 64598 192.00 55857 86.20
--------
7. Analysis Both uniform and selective crossover are exposed to hitchhiking due to pejorative sampling rates by selection. Moreover selective crossover is further exposed to a hitchhiker bias due to initialisation bias. From the results we can see that the ‘initialisation bias and hitchhiker bias only’ algorithm could not find the optimal solution. This algorithm is very much like uniform crossover; the difference is that uniform crossover is very disruptive against hitchhikers and does not posses the hitchhiker bias; whereas, the ‘initialisation bias and hitchhiker bias only’ algorithm supports hitchhikers. The positions for crossover in the ‘initialisation bias and hitchhiker bias only’ algorithm are defined during initialisation and alleles are favoured during recombination without knowing their true contributions. The fact that this algorithm cannot find a solution whereas uniform crossover can provides a string indication that hitchhiker bias hinders search. Further evidence of the deleterious effect of the hitchhiker bias can be found when we compare the performance of the ‘original selective crossover’ with ‘selective crossover without initialisation bias’. By eliminating the initialisation bias (Fig. 6) we have reduced the hitchhiker bias and thereby increased the performance of selective crossover. On the other hand, due to the mechanism
adopted by selective crossover where the correlation between parental fitnesses and offspring fitnesses is used to increase confidence in an allele’s fitness contribution, selective crossover is still exposed to the hitchhiker bias (as mentioned in Section 4). The results suggest that the hitchhiker bias restricts search in selective crossover and should be eliminated. However, the hitchhiker bias in selective crossover can not be eliminated completely since the hitchhiker bias is an unavoidable consequence of credit bias. Uniform crossover is very disruptive against hitchhiking from selection sampling but performed worse than selective crossover with or without the initialisation bias. Since neither operator is biased against schema of high defining length, our intuition is that selective crossover does better than uniform crossover because of its adaptive preservation characteristics imposed by the directional and credit biases. Selective crossover implicitly stores knowledge in the confidence values by using correlations between parental fitnesses and offspring fitnesses. This allows selective crossover to suppress the hitchhiker bias. The hitchhiker bias is suppressed in selective crossover because of the continuous nature of the confidence values. The confidence values are not bounded to predefined limits. They are allowed to increase indefinitely with respect to fitness increase. This implies that there is a great deal of competition amongst alleles, for a position on a chromosome, during recombination. The confidence value of an allele a at position i in individual x is not necessarily the same for allele a at position i in individual y. Each confidence value is a result of correlations in previous generations. For example in Fig. 4, allele ‘1’ is more superior than allele ‘0’, however the confidence value of ‘0’ at loci 2 was increased because of a fitness increase caused by the three ‘1’s introduced in the chromosome along with the ‘0’. This is an example of one recombination event in the entire population. Other recombination events may have combined a superior ‘1’ at loci 2 thereby, in the next generation the hitchhiking ‘0’ is suppressed by the higher confidence values of other superior alleles present at the same loci. A previous study on the distribution of the confidence values (Vekaria 1998) showed that the distribution was constantly changing with respect to the generations. The interesting thing is that this change in distribution was not the same for other problems. These confidence values were adapting to the specific problem; the behaviour of selective crossover was different for each problem and thus yielding superior performances.
8. Conclusions We have studied hitchhiking in selective crossover (an adaptive recombination operator that evolves better individuals by using a confidence vector to bias alleles during recombination). The results show that selective crossover is exposed to the hitchhiker bias and this bias hinders the performance of selective crossover. Selective crossover possesses two doses of hitchhiker bias; one from the initialisation method used and the other from the credit bias, which uses correlations between parents and offspring fitnesses. The experiments show that the hitchhiker bias is a
hindrance to selective crossover. We proposed a method to reduce this bias by using an alternative method of initialisation. The study showed that eliminating the initialisation bias reduces the hitchhiker bias. This increased the performance of selective crossover; however as the hitchhiker bias is an avoidable consequence of the credit bias it cannot be eliminated completely without a trade-off in performance. Uniform crossover does not possess the hitchhiker bias but performs worse than selective crossover. Both uniform and selective crossover are susceptible to hitchhiking due to pejorative sampling rates from selection. Although uniform crossover is highly disruptive against this form of hitchhiking it was still out-performed by selective crossover. This may be due to the additional directional and credit biases imposed in selective crossover, which allow it to suppress hitchhiking. We conclude that, the hitchhiker bias is also a problem for selective crossover, but it cannot be eliminated completely without a trade-off in performance. The adaptive nature of selective crossover and the exploitative biases it imposes on search, using the confidence vector, provides selective crossover with the ability to suppress hitchhiking and yield better performance.
Acknowledgements This research was supported by the Engineering and Physical Sciences Research Council. We also thank Richard Watson for his invaluable comments.
References Altenberg, L. (1994) Evolving Better Representations through Selective Genome Growth. In Proceedings of the IEEE World Congress on Computational Intelligence, 182-187. Forrest, S. & Mitchell, M. (1993) Relative Building-Block Fitness and the Building Block Hypothesis. In L. D. Whitley, editor, Foundations of Genetic Algorithms 2, 109-126. Morgan Kaufmann. Holland, J. H. (1975) Adaptation in Natural and Artificial Systems. MIT Press. Schaffer, D. J., Eshelman, L.J. & Offnut, D. (1991) Spurious Correlations and Premature Convergence in Genetic Algorithms. In G. Rawlins, editor, Foundations of Genetic Algorithms, 102-112. Morgan Kaufmann. Syswerda W. (1989) Uniform Crossover in Genetic Algorithms. In J. David Schaffer, editor, Proceedings of the Third International Conference on Genetic Algorithms, 10-19. Morgan Kaufmann. Vekaria K. & Clack C. (1998). Selective Crossover in Genetic Algorithms: An Empirical Study. In Eiben et al. (editors). Proceedings of the 5th Conference on Parallel Problem Solving from Nature, 438-447. Springer-Verlag. Vekaria, K. (1998) Increasing the Reliability of Genetic Algorithms Using Adaptive Recombination. Internal Note IN/98/2, Dept. of Computer Science, University College London, UK. Vekaria K. & Clack C. (1999). Biases Introduced by Adaptive Recombination Operators. To appear in Banzhaf et al. (editors.) Proceedings of the Genetic and Evolutionary Computation Conference, Morgan Kaufmann.