problems [Mitchell et al 1992, Forrest & Mitchell 1993] were an early attempt to ... and the utility of recombination in the GA [Goldberg and. Horn 1994, Mitchell et al ... Hypothesis in a test problem we need to go back to the. RRs and re-examine ...
Hierarchically-Consistent GA Test Problems Richard A. Watson
Jordan B. Pollack
Dynamical and Evolutionary Machine Organization Department of Computer Science, Brandeis University Waltham Massachusetts USA {richardw, pollack}@cs.brandeis.edu
Abstract The Building-Block Hypothesis suggests that GAs perform well when they are able to identify above-averagefitness low-order schemata and recombine them to produce higher-order schemata of higher-fitness. We suppose that the recombinative process continues recursively: combining schemata of successively higher orders as search progresses. Historically, attempts to illustrate this intuitively straight-forward process on abstract test problems, most notably, the Royal Road problems, have been somewhat perplexing, and more recent building-block test problems have abandoned the multi-level hierarchical structure of the Royal Roads, and thus departed from the original recursive aspects of the hypothesis. This paper defines the concept of hierarchical consistency, which captures the recursive nature of problems implied by the Building-Block Hypothesis. Hierarchical consistency causes us to re-think some of the concepts of problem difficulty that have become established - for example order-k delineation, and deception as defined with respect to a single global-optimum. We introduce several variants of problems that are hierarchically consistent, and discuss the concepts of problem difficulty with respect to these models. Experimental results begin to explore the effects that these variations have on GA performance.
Introduction The Building-Block Hypothesis [Holland 1975, Goldberg 1989] suggests that GAs perform well when they are able to identify above-average-fitness low-order schemata and recombine them to produce higher-order schemata of higher-fitness. We suppose that this process continues recursively: combining schemata of successively higher order as search progresses. More simply, the GA first finds solutions to small sub-problems and then puts these together to find solutions to bigger sub-problems, and so on to find a complete solution. The Royal Road (RR) problems [Mitchell et al 1992, Forrest & Mitchell 1993] were an early attempt to illustrate the hierarchical process of building-block discovery and recombination. However, it was apparent in the early work on RRs that there was something amiss [Forrest & Mitchell 1993b]. The functions were intended to exemplify the class of problems that required the recombinative aspects of the GA. However, non-recombinative algorithms, such as the Random Mutation Hill-Climber (RMHC), outperformed the GA on the Royal Roads. This contributed significantly to the controversy surrounding the Building-Block Hypothesis and the utility of recombination in the GA [Goldberg and
Horn 1994, Mitchell et al 1995, Altenberg 1995, Jones 1995]. The following section revisits some of the issues involved in the Royal Roads and more recent buildingblock problems and introduces some of the important established concepts in defining problem difficulty. We argue that the reason these models fail to exemplify the benefit of recombination is because they fail to followthough with the recursive implication of the BuildingBlock Hypothesis - at least, they do not follow-through consistently. This paper builds upon the work presented in [Watson et al 1998] and the hierarchically consistent test problem, H-IFF, introduced therein. The next section reviews the definition of H-IFF, and the concept of hierarchical consistency is defined. Following on from the work of Forrest and Mitchell [1993b], subsequent sections address the question, ‘What makes a hierarchically consistent problem hard for a GA?’ Subsequent sections expand the canonical form of H-IFF into other hierarchically consistent functions, and more general cases, that allow us to vary the difficulty of the problem. Our experiments use these functions and vary several parameters related to problem difficulty to examine the operation of the GA.
Background The primary suspect for the failure of the Royal Roads was their lack of deception. Deception comes in more than one variety [Horn & Goldberg 1994] but the general idea is that “a deceptive function is one in which low-order schema fitness averages favor a particular local optimum, but the global optimum is located at that optimum’s complement” [Goldberg et al 1989]. The Royal Roads do not exhibit any local optima, and, with the benefit of hind-sight, it is not surprising that a variant of a hill-climber could solve them. Accordingly, the original Royal Roads have largely been discarded their purpose served - as a stepping stone on the way to better understanding. They have been replaced by buildingblock functions which have some degree of deception as we shall discuss. But, the Royal Roads attempted to capture a property that other building-block functions have not addressed - specifically, they included a multi-level hierarchical structure. Indeed deception seems to preclude a clear understanding of how to construct a multi-level problem. Multi-level deception implies deception on the
large-scale which seems simply too hard. And the concept of order-k delineation formalizes the belief that there are no higher levels of interdependency in a problem - K, in this concept, is “the highest order deceptive nonlinearity suspected in the subject problem” [Goldberg 89]. In order to regain the hierarchical aspects of the Building-Block Hypothesis in a test problem we need to go back to the RRs and re-examine the misconception that lead to their downfall. We shall also compare them to the alternative building-block functions - specifically, concatenated deceptive functions [Goldberg 1989].
Hierarchical combinations? The Royal Road functions consist of a set of buildingblocks - above-average-fitness schemata of short defining length [Holland 1975, Goldberg 1989]. There are eight blocks of eight bits each. These blocks have a corresponding reward structure that consists of fitness contributions for each block, and in the second version of the Royal Road problem, R2 [Mitchell and Forrest, 1993], there are also bonus fitness contributions for particular combinations of blocks. Unfortunately, although this sounds like the right idea for a hierarchical building-block problem, its implementation was flawed. Though it is true that they featured no deceptive aspects we present an alternative on their shortcoming. The solution to each block is defined by a particular combination of bits and the number of possible combinations of 8 bits is 256. This defines the problem at the first level. The problem at the next level concerns combinations of blocks. There are eight blocks in the second level (of the R1 variant) just as there are eight bits in the first level - so far, the problem is consistent. However, the ‘combinations’ of blocks that are rewarded are not (and cannot be) defined in a manner consistent with the bit level. This is because there is only one aboveaverage-fitness schema for each partition corresponding to a block whereas there were two possible values for each bit. Finding ‘combinations’ of blocks when each block only has one rewarded configuration is quite different from finding the right combination of bits where each bit has, of course, two possible values (where no particular bit-value is preferred). Since the fitness contributions identify the correct configuration of each block uniquely, a block can only take one value, as it were, and there is in fact only one possible combination of the 8 blocks that the first level identifies. To put it another way, the blocks are separable - meaning that the optimal setting for the bits in one block is not dependent on the setting of bits in any other block.1 Moreover, it is not possible to define any interdependency 1 Note that this observation does not contradict the fact that the fitness contribution of a block may vary as a non-linear function of the settings of bits in other blocks [Whitley 1995].
between blocks (i.e. to make blocks non-separable) when the solution to each block is identified uniquely [Watson et al 1998].
Competing schemata Interestingly, the building-block problems that incorporate deception (e.g. concatenated trap functions) do reward more than one combination of bits within each block. Specifically, the deceptive schemata are the complement of the schemata that contain the global optima. But there it ends. The concatenated trap functions [Goldberg et al 1989] fail to follow through with the recursive implications of the Building-Block Hypothesis. Rather than define interactions between the blocks using the two possible kinds of block (in a manner similar to that used to define the interdependency between the bits) the whole problem is simply a concatenation of many of the base deceptive functions. This shortcoming is illuminated by their susceptibility to another kind of non-recombinative algorithm. Although they defeat RMHC, to which the Royal Roads fell prey, the Macro-Mutation Hill-Climber (MMHC) [Jones 1995] replaces whole chunks of the bits in the genome with random values. This enables the MMHC to escape the local optima created by the deceptive base functions. Since there are no higher level interdependencies than order-k, the size of the base functions, this is sufficient to solve the whole problem. (In fact, as we mentioned before, it is not immediately clear just how we could define interdependency higher than order-k and still have meaningful building-blocks.) In another avenue of research, ways have been found to foil the MMHC. The macro-mutation hill-climber relies on bit-adjacency as a heuristic for identifying which subset of genes to mutate. Accordingly, by shuffling the bit order we can defeat the MMHC - but unfortunately, this also defeats the GA. The GA also relies on bit adjacency for effective recombination; as mentioned, building-blocks are defined as low-order schemata with short-defining length i.e. the genes must be close together on the genome ([Holland 1975, Altenberg 1995]). And, as a result, a whole flurry of research has been directed at algorithms that discover and represent the interdependency of bits by re-ordering them on the genome (e.g. [Harik & Goldberg 1996], [Harik 1997], [Kargupta 1997]). Whilst poor linkage, i.e. unfavorable positioning of interdependent bits, certainly is a problem for GAs and an issue that must be addressed, it does not directly address the original concern of the Royal Road problems - the recursive or hierarchical nature of searching combinations.
Competing-schemata and hierarchy The solution to these issues is really quite simple. We can combine the competing schemata found in the base level of the concatenated trap functions with the hierarchical aspect of the Royal Roads. The Hierarchical-if-and-only-if (H-
IFF), problem introduced by Watson et al [1998], has a hierarchical building-block structure not unlike R2. However, H-IFF defines two above-average-fitness configurations for each block - analogous to the competing schemata in the deceptive traps. Then fitness contributions at the next level in the hierarchy are defined via combinations of these two kinds of block. In this manner the interdependency between blocks can be defined in exactly the same manner as the interdependency between bits. This produces two kinds of above-average-fitness ‘meta-blocks’ over which we search combinations again, and so on to eventually produce two global optima. H-IFF is the first example of a building-block test problem which is hierarchically consistent.
Hierarchical Consistency Before defining hierarchical consistency it will assist us to review our canonical example: Hierarchical-if-and-only-if (H-IFF) [Watson et al 1998].
H-IFF We may view H-IFF as a problem defined over successive levels of building-blocks. Briefly, at the bottom level each non-overlapping adjacent pair of bits constitutes a block and has two solutions. Specifically, the block confers a fitness contribution if the bits are either both zero or both one. Similarly, at the second level in the hierarchy, pairs of blocks from the first level confer additional fitness if they are equal i.e. all 4 bits are zeros or all 4 are ones. Blocks of 8, 16 etc. are rewarded over subsequent levels up to the complete string. The fitness contribution for a block of two bits is 1, and the fitness contributions double at each level, keeping lock-step with the size of the block. More formally, the fitness of a string using H-IFF can be defined using the recursive function, given below. This function interprets its argument as a block of bits and recursively decomposes this block into left and right halves. In this manner, a string is evaluated by summing the fitness contributions of all blocks at all levels. if |B|=1, f(B)= { 1, { |B| + f(BL) + f(BR), if (|B|>1) and (∀i{bi=0} or ∀i{bi=1}) { f(BL) + f(BR), otherwise. where B is a block of bits, {b1,b2,...bn}, |B| is the size of the block=n, bi is the ith element of B, and BL and BR are the left and right halves of B (i.e. BL={b1,...bn/2}, BR={bn/2+1,...bn}). n must be an integer power of 2. H-IFF is deceptively simple2: some features should be highlighted. The structure of H-IFF is very close to the structure of R2. R2 defines solutions for the base-level 2 ‘deceptive’ is not intended in the formal sense here.
blocks (8-bits) as all-ones, and includes bonus fitnesscontributions for pairs of adjacent blocks, and fours, and finally all eight blocks. But there are critical differences between H-IFF and R2. Most importantly, in R2 there is only one solution to each block at each level. In H-IFF there are two competing high-fitness schemata for each block at each level. And adjacent blocks must be of the same type to confer fitness contributions at the next level. This seemingly minimal change is crucial to enabling hierarchical consistency that cannot be found in R2, as suggested above. The definition of H-IFF might raise some questions: If both ones and zeros are rewarded equally, and there are two global optima, where is the misleadingness? Where is the deception? Where are the local optima? Local optima in H-IFF occur when incompatible building-blocks are brought together. For example, consider “11110000” viewed as two blocks from the previous level (i.e. size 4) both blocks are good, but when these incompatible blocks are put together they create a sub-optimal string that is maximally distant (in Hamming space) from the next best strings i.e. “11111111” and “00000000”. Deception in HIFF is more to do with incompatibility with context than incompatibility with a global optima as it is usually defined. Note that although local optima and global optima are distant in Hamming space they are close in recombination space [Jones 1995]. Thus H-IFF exemplifies the class of problems for which recombinative algorithms are well-suited. The results shown in [Watson et al 1998] verify that this kind of hierarchical interdependency is hard for nonrecombinative algorithms to solve (both RMHC and MMHC - or equivalently, the so called “headless chicken test” that replaces recombination in a GA with macromutations). It also demonstrates that this type of problem is easy for a GA to solve given that a mechanism to maintain diversity in the population is employed. The latter requirement seems reasonable, since we are intending to investigate those aspects of a GA that distinguish it from non-population-based search techniques, but we shall investigate this further in this paper. Experiments in [Watson et al, 1998] also include a shuffled version of HIFF exhibiting poor linkage - algorithms to address this additional difficulty are addressed elsewhere [Watson & Pollack, 1999].
Defining hierarchical consistency In [Watson et al 1998] a hierarchically consistent problem is described as a problem where “the nature of the problem is the same at all levels in the hierarchy.” We can state this more precisely. First, let us note that a problem can only defined with reference to a set of variables. We usually consider the variables of GA problems to correspond directly to the genes, each of which takes a number of values or alleles, generally two. And this is true for the first
level, level 0, in a hierarchical problem also. The problem at level 0 then, will be to find good combinations of bits above-average-fitness low-order schemata. However, at the second level, the problem is to find good combinations of schemata from the first level. The key to hierarchical consistency is to recognize that the variables over which a problem is defined scale-up as we ascend levels. Using this concept of a problem defined with respect to a set of variables, we now formally define hierarchical consistency as follows: Definition: A problem is hierarchically consistent if for some K, the problem of finding good schemata of length L, given good schemata of length L/K, is of the same class for all L. (We will permit the limitation that L is an integer power of K, and naturally, L ≤ N, the total size of the problem.) This definition does not dictate which class of problem is involved at each level: it will likely include some kind of interdependency between variables assuming we want the problem to be hard, the partitions of subproblems may be neat or not, it may include linkage difficulties, it may be real-valued or discrete, or whatever. The definition is merely concerned with the consistency of the problem at different scales of resolution. In fact, there may not even be strict levels of hierarchy - the restriction that L is a power of K applies distinct hierarchical levels - but this is not required for a problem to be hierarchically consistent. And we can even permit open-ended problems that have no fixed N, nor therefore any upper limit to L. Let us examine the definition with respect to H-IFF. In H-IFF the problem at each level is a combination problem. The lengths of blocks double at each level, i.e. K=2. Naturally, in H-IFF the solutions to the level-1 blocks3 are defined by specific combinations of bit values (bits being level-0 blocks). In H-IFF there are 2 bits in each level-1 block and, to confer fitness, both bits must be the same kind. There are two possible ways for this to be the case. i.e. “00” and “11” - so there are two varieties of solution for each level-1 block, the 0 kind and the 1 kind. This is analogous to the two varieties of bits at level-0. So, we can define the fitness contribution for level-2 blocks (size 4) using specific combinations of level-1 blocks in exactly the same way as we did with the bits at the first level - i.e. the solution for a level-2 block is that it contains two level-1 blocks of the same kind. Thus we see that the problem of finding solutions to level-1 blocks given solutions to level0 blocks is the same class as the problem of finding solutions to level-2 blocks given solutions to level-1 blocks. This consistency is maintained through all levels. 3 Since in H-IFF a schema either confers fitness contribution
or it does not, we shall refer to ‘above-average-fitness schemata’ as solutions. And we shall use the term block to refer to the partition corresponding to solutions.
Now we can see why it is not possible for R2 to be hierarchically consistent. Of course, the smallest base for a set of variables, where finding combinations is problematic, is two. Hence R2, like almost every GA test problem, uses particular combinations of bits to define the solution to the level-1 blocks. If each gene could only take one value this would be a non-problem. Yet, there is only one kind of solution for each block - and so finding the right combination of blocks is, by the same criteria, a nonproblem. The fact that the blocks in the Royal Road problems, and all other building-block problems [Watson et al 1998] are linearly separable is a direct result of there being only one kind of solution to each block - we cannot define non-trivial combinations of variables in base one. As mentioned, the concatenated trap functions do, in fact, have two notable above-average-fitness schemata for each block - and, just like H-IFF, these are all-1s and all0s. But the 0s schemata are not considered to be ‘solutions’ - they are of no use to the next level of building-blocks they are merely traps incorporated to foil hill-climbers. Thus the concatenated trap functions are also not hierarchically consistent. Whereas, the trap functions use deception within a block to defeat hill-climbers, H-IFF uses the same kind of deception, at larger scales, to defeat macro-mutation hill-climbers [Watson et al, 1998].
Constructing Hierarchically Consistent Problems In the previous discussion it will be clear that we are interested not only in the fitness contribution of a solution but also in which solution it is. Which variety, or which ‘kind’, of solution we have found, will determine the interdependency of this solution with the solutions to other sub-problems. In H-IFF, for example, there are two kinds of solution - the all-ones kind and the all-zeros kind. When there are two solutions of equal fitness, as in this example, it is clear we cannot define the interdependency of blocks as a function of their fitnesses - it must be a function of their kind. But we do not want there to be as many kinds of blocks as there are combinations of bits - this would mean that the blocks have no substance - they would not allow abstraction of the bits they contain. Indeed, in order for the problem of finding blocks at level-2 to be the same class as the problem of finding blocks at level-1, the base of the variable (the number of kinds) in each of these levels should be the same. For example, if as usual the base of variables at level-0 is two then there should also be two kinds of blocks at level-1. Where the number of variables in a block is K, and the base of the variables is S (size of K alphabet), the number of possible variable settings is S . From these, only S of them will be useful at the next level. This constitutes dimensional reduction - that is, K although there are S possibilities for a block, only S need be considered at the next level. Since the optimal setting
for each block, at any level, including the top level, is dependent on the setting of all other blocks (i.e. they have to be the same kind to confer the optimal fitness), the optimal setting for each bit in the whole problem is dependent on the setting of every other bit. A seemingly impossible problem to solve without the insight of dimensional reduction. But the feature of dimensional reduction enables an algorithm to get purchase on these problems. Although every bit is dependent on every other, we need not consider all possible combinations of bits. For example, in H-IFF, at the top level we need only consider the four possible combinations of the length 32 blocks. The Building-Block Hypothesis describes exactly this kind of dimensional reduction when it refers to combinations of schemata that supersede combinations of bits. These considerations lead us to construct hierarchically consistent problems using two functions - one defining the fitness contributions of blocks and the other defining the kind or variety of blocks. We shall call the latter the ‘transform’ function since it defines which kind of block a collection of bits is transformed into for the next level in the hierarchy. The transform function defines the dimensional reduction - it transforms a block of k symbols into a single symbol of the same alphabet and thus, as it is applied over successive levels, incrementally reduces the dimensionality of the problem. For example the transform function for H-IFF is defined to reduce two symbols to one symbol as follows: t(a,b)= {0, if a=0 and b=0, {1, if a=1 and b=1, {null, otherwise. Notice that t defines a non-solution, null, as well as the two solutions, 0 and 1. Importantly, to allow recursion, the definition of t holds for all a and b from the ternary alphabet {0, 1, null}. The fitness contributions are then defined over this alphabet as follows: {1, if a=1 or a=0, f(a)= {0, otherwise. It is the job of the fitness function to direct search toward the useful schemata for the next level. Accordingly H-IFF rewards the 0 kind and the 1 kind of block but does not reward the null, or non-solution blocks. These two base functions, f and t, are utilized by two corresponding recursive functions, F and T, that define the fitness of a whole string and the transform of a whole string respectively. F and T decompose a string into its constituent blocks, and sum the fitness contributions from every block at every level. The fitness contributions are scaled according to their size.
if | B |= 1, f (B) F(B) = k | | (T(B)) B f F(B i ) otherwise. + ∑ i=1
if | B |= 1, b1 T (B) = 1 k t(T(B ),...,T(B )) otherwise. where f is a base function, f:α→ℜ, giving the fitness k of a single symbol, t is a base function, t:α →α, that defines the resultant symbol from a block of k symbols, |B| is the number of symbols in B, and Bi is th the i sub-block of B i.e.{b(i−1)d+1 ,..., bid } (where d=|B|/k).
Generalizing H-IFF Watson et al [1998] separate the recursive structure (F and T) from the specifics of the problem (f and t) in the above manner to enable us to generalize from H-IFF to other hierarchically consistent problems. We now provide alternate base functions, f and t, and discuss the features they produce in the overall problem. Our intent is to begin to dimensionalize the difficulty of hierarchically consistent problems. We shall discuss the concept of deception in the hierarchically consistent class of problems, discuss the alternative view of K (which is now not equal to the order of deception), the relative value of competing schemata, the difficulty of each block, and the interplay of blocks between one level and the next. In all cases we will only vary the base fitness and transform functions, f and t. The recursive construction functions, F and T, are used unchanged together with these base functions to complete the definition of each variant problem. F and T, ensure that the resultant overall problem is hierarchically consistent.
Sections To help us in obtaining an intuitive feel for the variations we are about to introduce we shall employ cross-sections through the fitness landscapes they define. In particular, we choose a section from all zeros to all ones. For example, see the curve labeled ‘hequal-2’ in figure 1 c - (this curve, as we shall explain, is identical to that for H-IFF.) The first point on this section is the fitness of the all zeros string for a 64-bit problem; the second point is the fitness of a string starting with a single 1 followed by 63 zeros; the third, 2 leading ones and the remainder zeros, and so on. Note that in the original H-IFF the two ends of this section (i.e. all ones and all zeros) are the global optima, and that the best local optima are at half ones followed by half zeros or vice versa. A point with this second-best fitness therefore appears in the middle of the section. The recursive aspect
of the functions are clear in these sections; each half section is to the whole section as each quarter section is to the half section. Although illuminating, we must be cautious in the use of these sections. We are showing only 64 65 points out of a total 2 , and only 33 local optima are 32 4 evident out of a total 2 .
H-Equal Logical if-and-only-if, on which H-IFF is based is a special case of equality. A simple generalization of the base functions to H-IFF enables the definition of H-Equal Hierarchical-Equality - given below. {s, if for all i, si=s, t-equal({s1, s2,...sk}) = {null, otherwise. {1 if a/=null, f-equal (s) = {0, otherwise Firstly, t-equal is defined for any size alphabet (and null), rather than the binary case of t-HIFF. Secondly, tequal enables us to extend the definition of H-IFF to a larger number of sub-blocks per block. As well as utilizing a binary alphabet, H-IFF also uses exactly two sub-blocks per block. In the concatenated building-block functions mentioned earlier, K is used to denote the size of the base deceptive functions. This is true for our use of K also, but since this K is used consistently at all levels in the hierarchy, we could refer to it as the branching factor of the interdependencies. Note that neither H-IFF nor H-Equal (nor any of the problems defined in following sections) are order-K delineable. In fact they are not delineable for any order of schemata (except the size of the problem). In this sense our use of K differs from the established usage in this context. H-IFF is simply the special case of H-Equal where K=2 and S=2. (The size of the alphabet, S, as we use it here means that there are S symbols in addition to null). The sections for H-Equal shown in figure 1c include the section through H-IFF. They also show the sections for K=4 and K=8 for a 64-bit problem. S and K give us two parameters for varying the difficulty of each sub-problem in the overall problem. Of course we can also vary the overall size of the problem, N (N must be an integer power of K). All other things being equal, an increase in the difficulty of the base problems will increase the difficulty of the overall problem. The number of solutions to each block in HEqual is S. And the number of possible combinations of K symbols in each block is S . We will expect therefore that the difficulty of a problem will increase as S or K increase 4 Any string with an “01” or “10” block in H-IFF is not a local optima as it can be mutated to “00” or “11”, which confers higher fitness, in one bit-flip. However, any string consisting of concatenations of equal pairs (“00”s and “11”s in any order) are local optima since all one-bit mutations decrease fitness. There are 32 bit-pairs in a 64 bit problem, and two kinds of pair, 32 therefore there are 2 local optima.
- in either case the ratio of solutions to possible combinations decreases. However, if we keep N fixed then varying K has a side effect. For example, the number of hierarchical levels in a 64-bit H-IFF is 6 (including level-0 individual bits and level-6 blocks of 64-bits). If we increase K to 4 the number of levels is 4 - corresponding to blocks of size 1,4,16, and 64. And at K=8 the number of levels is only 3. We will expect that the difficulty of the overall problem will be inversely related to the number of hierarchical levels. Thus the effect of increasing K, although it surely increases the difficulty of each subproblem, decreases the levels of recursion, assuming N is constant. S on the other hand is perhaps a more clear-cut increase in difficulty. Our experimental results will return to these issues.
Bias We have already noted that H-IFF uses competing schemata, like those found in deceptive functions, in a hierarchically consistent fashion. Following the lead provided by deceptive trap functions we may bias one of the two competing schema, that is, define one to have a lower fitness contribution than the other (instead of two equal ones in H-IFF). Deb and Goldberg [92] analyze deception in trap function in terms of the ratio of fitness contributions for the desired schema and the competing schema. Their analysis concerns only a single trap function rather than a hierarchical structure, and it is assumed that only one of the two competing schemata is the real solution, the other is merely a distraction. However, we can still investigate the effect of this measure in H-IFF. f-bias given below operates on a binary alphabet and K=2, as the original H-IFF but unlike the canonical H-IFF the two competing solutions for each block are not rewarded equally. Without loss of generality, we will keep the fitness of the ones blocks at 1, and decrease the fitness of the zeros blocks to some value B, [0,1]. This will have the effect of making one of the previous two global optima into a unique optimum and depressing the other. We shall refer to this depressed point as the global-complement since it is the bit-wise complement of the global optimum. Hierarchical consistency will dictate that the ratio of the fitness of the global-complement to the fitness of the global optimum will be the same as the ratio of the competing and preferred solutions to each block - i.e. B (this is seen in the landscape sections that follow). Thus we might equally refer to the bias value, B, as the competition ratio. f-bias(a)= {1 if a=1, {B if a=0, {0, otherwise. Since in H-IFF there were two equal fitness global optima we might think that the fact we now only have one global optimum has made the problem harder. Conversely, the preference for schemata that contain the global optima
is expressed all the way down to the bit-level, making the problem easier. In accordance with this view, we noted earlier that a problem that only had one solution to each block became a non-problem - as B is decreased we more closely approach this case. Figure 2 a) shows sections through the H-IFF landscape for various values of B. We see that as the competition ratio approaches 0, where there are no competing schemata, the problem shows no local optima. This case is similar to the Royal Roads problem mentioned earlier, R2, where only one variety of block is rewarded but there are bonuses for pairs and additional bonuses for fours. Intermediate values of B indicate that, because ones are favored consistently throughout all levels, the fitness landscape is merely tilted toward all ones. For example, we see that in the B=0.33 section that, although there are local optima (unlike B=0), the optima at 48 ones followed by 16 zeros has a higher fitness than that of 32-ones and 32zeros.
H-XOR and biased H-XOR When exploring the space of hierarchically consistent problems it is important to ensure that the transform of a block is a non-separable function of its arguments. If this were relaxed then, since the functions are hierarchically consistent, the entire problem would be separable. IFF, the base function of H-IFF, is one of only two non-separable functions of two bits that returns a value in the same alphabet. The other is logical exclusive-or, XOR. Other functions have either only one solution or are dependent on only one variable. XOR is the exact negation of IFF, and as such, substituting a transform function based on XOR instead of IFF, to produce H-XOR, yields nothing interesting, in itself. H-XOR, resulting from t-XOR defined below, has exactly the same properties as H-IFF except the solutions are recursively dissimilar at all scales instead of similar at all scales. For example, solutions for an 8-bit HXOR are “10010110” and “01101001” (rather than “00000000” and “11111111” for H-IFF). {1, if a=1 and b=0, t-XOR(a,b)= {0, if a=0 and b=1, {null, otherwise. However, although t-XOR is not very interesting in itself, it is more interesting when combined with f-bias. The combination produces a problem where 1s are favored over 0s but 0s are also required. This means that the GA cannot be permitted to converge on just ones and ignore the zeros solutions. These less favorable schemata must also be discovered. The result of varying the competition ratio in H-XOR is indicated in figure 2c.5 We see that 5 Since the global optima for H-XOR is not all ones but the recursively dissimilar strings mentioned above, these sections in figure 2c) substitute the beginning of the string “10010110...” into its complement “01101001...”, rather than substituting zeros
although B=1 in H-XOR is the same as B=1 in H-IFF the affect of competition ratios less than one is quite different. It is a little difficult to describe the features of the resultant sections but one interesting feature is that the fitness of the global-complement for B=0 is not zero as it is in H-IFF but approximates an average value of the section, and there are certainly still local optima.
Experiments We shall now begin an experimental exploration of the variants defined above. We have identified the following parameters for affecting the difficulty of hierarchically consistent problems: N, the size of the problem, i.e. number of bits (or symbols) in the problem. S, the alphabet size K, the number of sub-blocks per block B, the ratio of the fitness contribution of the competing schema to the preferred schema (equal to the ratio of the global-complement to the global optimum). We have also identified the variation H-XOR and recognized that the effect of bias in H-XOR is not likely to make the problem easy in the same way that bias does in H-IFF.
Algorithms Watson et al [1998] demonstrate that a regular GA succeeds easily on H-IFF if we use a mechanism to prevent premature convergence. Here we examine how the unaided GA performs and explore how much augmentation is required for the recombination of building-blocks to occur effectively. The basic GA we employ is the same in all cases (and the same as that used in [Watson et al 1998]): A generational GA with a population of 1000; exponentially scaled rank-based selection; elitism of 10%; two-point cross-over applied with a probability of 0.3; bit-wise mutation with a probability of 2/64 of assigning a random value to each gene. In addition to the performance of this basic GA we also try three augmentations concerned with promoting diversity. One method used is to share out fitness through the population according to bit-wise similarity. This is similar to fitness sharing as defined by Deb & Goldberg [1989] but it is implemented using a resource-depletion model. This model was introduced by Watson et al [1998] but in that work, as in the second variant utilized here, a resource is associated with each solution to every block at every level in the hierarchy rather than only to each bit. It transpires that this is more assistance than is required to into the all-ones strings as in the other sections. Thus the sections still go from the global-complement to the global optimum.
solve the standard H-IFF and that resources just at the first level are adequate - i.e. simply maintaining coverage of both alleles at each locus will suffice - hence our first variant here. The resource model is a form of implicit fitness sharing - rather than limit available fitness to the best individual (in a sample of the population) for each sub-problem, the resource model gives fitness to the first individual to solve a sub-problem and less to the second, etc. This is a more suitable form of fitness sharing for problems where the sub-problems do not have ‘degrees’ of solution. The last variant is a coevolutionary model. Here each individual is rewarded only for those blocks it solves that a co-evaluated individual does not solve. Both of the latter two methods use domain knowledge in their evaluation. They are not intended as general methods of fitness sharing and will not be applicable to many realworld problems. However, they are illuminating in examining the operation of the GA and in particular the requirement for diversity in the population. In summary, the algorithms implemented are as follows:
“none” - Basic GA as above with no fitness sharing. “min” - GA with resource-based fitness sharing only at the bit level. “all” - GA with resource-based fitness sharing for all blocks at all levels. “coev” - GA with co-evaluation of individuals. In all cases the algorithms are run for 300 generations and we examine the best string in the last generation. Since the fitness of strings changes in the different variants of the problem it is not meaningful to compare fitnesses. Instead we shall compare the sizes of the biggest all-ones block discovered. We are particularly interested to see whether the algorithms are able to discover the unique size-64 allones block in each case. In the case where we vary N, we shall show the size of the largest block discovered as a proportion of the largest possible block i.e. size/N. And in the case where we vary the alphabet size S, we will show the size of the biggest block of all the same symbol. hequals 60
none min all coev
1
none min all coev
50 0.8
size of largest block discovered
size of largest block discovered (as proportion of max size)
hiffn
0.6
0.4
40
30
20
0.2
10
0
0
50
100
150
200
0
250
0
1
2
3
N − size of problem in bits
a)
4
5
6
7
performance vs. N, size of problem in bits, (S=2, k=2).
b) performance vs. S, size of alphabet,
hequal K
(N=64, k=2).
hequalk 60
none min all coev
size of largest all−ones block discovered
400
350
fitness
300
250
200
150
100
hequal 2 hequal 4 hequal 8
50
0
0
10
20
30
40
number of leading ones
c) sections through H-IFF, various K.
8
S − alphabet size
50
60
50
40
30
20
10
0
0
1
2
3
4
5
6
7
8
K − number of sub−block per block
d) performance on H-IFF with various K
(S=2, N=64)
Figure 1: Varying N, problem size (in bits), S alphabet size, K number of sub-blocks per block. See text for details
Results Figure 1 shows the performance of the four algorithms on various N, S and K, (in addition to the sections through various K, mentioned earlier.) Figure 1a) shows performance on H-IFF for N=16,32,64,128 and 256. (K=2, S=2). The first observation is that there is a general decline in performance with increasing N as expected. Also not surprising is that the algorithm with the strongest fitness sharing performs best. In fact it succeeds for N up to 128. However, the other algorithms fair poorly; they only succeed reliably for N=16. We shall have to make the problem easier than the canonical H-IFF to see adequate performance in these algorithms that have weaker diversity maintenance. Figure 1b) shows performance on H-Equal for S=2,4 and 8 (N=64, K=2). Again it is not surprising that
increasing S decreases performance. Again we see that “all” resources performs best. Figure 1 d), performance on H-Equal for various K (S=2, N=64), is more interesting. It shows that when diversity in the population is maintained, increasing K up to 8 does not prevent the GA from finding the global optimum. However, when diversity is not maintained increasing K seems to increase problem difficulty though the other algorithms performed poorly in any case. Figure 2 shows the sections and the performance of H-IFF and H-XOR for various competition ratios, B=0.0,0.33,0.66 and 1.0. Figure 2 b) shows the performance on biased H-IFF. Recall that B=1 is the same as the original H-IFF and B=0 means that there are no competing schema. Figure 2b) shows that the problem is indeed easier at low values of B; Even the GA with no fitness sharing succeeds for low values of the
hiff biased
hiff 60
size of largest all−ones block discovered
400
350
fitness
300
250
200
150
100
bias 0.0 bias 0.33 bias 0.66 bias 1.0
50
0
0
10
20
30
40
50
50
none min all coev
40
30
20
10
0
60
0
0.1
number of leading 1s
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
bias (value of competing schema, i.e. bias=0 is no competition)
a) sections through H-IFF with various bias
b) performance on H-IFF with various bias
hxor biased
hxor 60
size of largest all−ones block discovered
400
350
fitness
300
250
200
150
100
bias 0.0 bias 0.33 bias 0.66 bias 1.0
50
0
0
10
20
30
40
number of leading correct bits
c) sections through H-XOR, various bias
50
60
50
none min all coev
40
30
20
10
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
bias (value of competing schema, i.e. bias=0 is no competition)
d) performance on H-XOR, various bias
Figure 2. Varying the competitive ratio, or Bias, in H-IFF and H-XOR landscapes. See text for details.
0.9
1
competition ratio. Only as B approaches one and the competing schemata confer equal fitness contributions does the interdependency in H-IFF make the problem hard for the GA, and resolve the abilities of the four GA types. Figure 2d) shows the performance on H-XOR with various competition ratios. We see that the effect of changing B in H-XOR is quite different from the effect of changing B in H-IFF. Specifically, even B=0 does not make the problem easier for algorithms that do not maintain diversity - though the results are a little erratic with the data collected to date. It seems likely that a treatment analogous to that provided by Deb and Goldberg [1992b] may yield a critical ratio of bias in HIFF and H-XOR, and more illumination than these preliminary explorations.
Conclusions This paper has examined the concept of hierarchical consistency in GA test problems. We have discussed the shortcomings of existing building-block style test functions in light of this concept and presented several alternative problems which offer various features of difficulty in a hierarchically consistent manner. The purpose of this work, like the development of other test problems, has been to provide illumination on the operation of the GA - in this case the role of recombination and the validity of the Building-Block Hypothesis. With this goal in mind it has not been our intent to convince the reader that real-world problems have a hierarchically consistent structure - but perhaps the reader will have some sympathy for the view that they do.6 Besides, unless we have a good reason to suppose that a problem is different in kind at different scales of resolution, the most parsimonious assumption is that they are the same class at all scales. Regardless, the concepts and examples discussed here assist us understanding the operation of the GA and the Building-Block Hypothesis. The Royal Roads, concatenated trap-functions, and other building-block functions to date have not been hierarchically consistent. And concepts which have been useful in understanding the difficulty of these previous building-block functions are not directly applicable to functions which are hierarchically consistent. In particular, we have noted that the hierarchical problems we have examined are not order-K delineable for any K. Also, the ratio of the fitness contributions of competing schema, which is important in defining deception in trap functions, has previously been analyzed only for the single function case. We have begun to “re6 After all, it is clear that natural systems are often recursive in structure, to wit “the fractal geometry of nature” [Mandelbrot 1983], and it seems reasonable that this should induce a hierarchical structure to the problems we encounter.
dimensionalize” problem difficulty for the hierarchicallyconsistent framework and have explored the parallels of the existing concepts of difficulty in this context. Although hierarchically consistent problems are not delineable there is still a sense in which the subfunctions have a size. We have transferred K to mean the number of sub-blocks per block. In addition to N, the size of the problem, and S, the size of the alphabet, K provides one means to increase problem difficulty in a hierarchically consistent manner. We have also seen that varying the ratio of the fitness contributions of the optima affects problem difficulty. But, the affect of this biasing on the whole problem is dependent on the way in which solutions from one level transfer to solutions at the next. In H-IFF, as in the trap functions, when the value of competing schemata is depressed the whole problem becomes easier. However, in H-XOR, we treat both of the competing schemata as required parts of the solution (instead of treating one as a simple distraction as in trap functions). In this case we find that changing the competition ratio does not affect the overall problem in the same way. The experimental results indicate that the basic GA without diversity-maintenance can succeed on hierarchically consistent problems in the case where simple biasing means that only one type of block is required. However, in the limit where there are no competing schemata, there are also no local optima and therefore this class of problem can also be solved by a hill-climber. This observation matches our intuitions that if the population of a GA is allowed to converge its operation becomes less distinct from non-populationbased methods. This lends credence to the study of fitness sharing methods that maintain diversity in the population. And, in the larger perspective, the concepts we have illustrated support the Building-Block Hypothesis and the view that population-based recombinative algorithms provide fundamentally different advantages from nonrecombinative algorithms.
Acknowledgments We are grateful to the members of DEMO at Brandeis, also Alden Wright, for useful discussion and insight.
References Altenberg, L, 1995 “The Schema Theorem and Price’s Theorem”, FOGA3, editors Whitley & Vose, pp 23-49, Morgan Kauffmann, San Francisco. Deb, K & Goldberg, DE, 1989, “An investigation of Niche and Species Formation in genetic Function Optimization”, ICGA3, San Mateo, CA: Morgan Kauffman.
Deb, K & Goldberg, DE, 1992, “Sufficient conditions for deceptive and easy binary functions”, (IlliGAL Report No. 91009), University of Illinois, IL. Deb, K & Goldberg, DE, 1992b, “Analyzing Deception in Trap Functions”, In Darrell Whitley, editor Foundations of Genetic Algorithms-2 (FOGA2), Vail, CO. Forrest, S & Mitchell, M, 1993 “Relative Building-block fitness and the Building-block Hypothesis”, in Foundations of Genetic Algorithms 2, Morgan Kaufmann, San Mateo, CA. Forrest, S & Mitchell, M, 1993b “What makes a problem hard for a Genetic Algorithm? Some anomalous results and their explanation” Machine Learning 13, pp.285319. Goldberg, DE, 1989 “Genetic Algorithms in Search, Optimisation and Machine Learning”, Reading Massachusetts, Addison-Wesley. Goldberg, DE, Korb, B, & Deb K, 1989 “Messy Genetic Algorithms: Motivation, Analysis and First Results” Complex Systems 1989, 3, 493-530. Harik, GR, & Goldberg, DE, 1996, “Learning Linkage” in Foundations of Genetic Algorithms 4, Morgan Kaufmann, San Mateo, CA. Harik, GR, 1997 “Learning Gene Linkage to Efficiently Solve Problems of Bounded Difficulty Using Genetic Algorithms” PhD dissertation, Department of Computer Science and Engineering, University of Michigan. Holland, JH, 1975 “Adaptation in Natural and Artificial Systems”, Ann Arbor, MI: The University of Michigan Press. Holland, JH, 1993 “Royal Road Functions”, Internet Genetic Algorithms Digest v7n22. Horn J, and Goldberg, DE, 1994, “Genetic Algorithm Difficulty and the Modality of Fitness Landscapes” in Foundations of Genetic Algorithms 3, Morgan Kaufmann, San Mateo, CA. Jones, T, 1995, Evolutionary Algorithms, Fitness Landscapes and Search, PhD dissertation, 95-05-048, University of New Mexico, Albuquerque. pp. 62-65. Jones, T, & Forrest, S, 1995 “Fitness Distance Correlation as a Measure of Problem Difficulty for Genetic Algorithms” ICGA 6, Morgan & Kauffman. Kargupta, H, 1997, “Gene Expression: The Missing Link In Evolutionary Computation” Genetic Algorithms in Engineering and Computer Science, Eds. Quagliarella, Q Periaux, J, and Winter, G.: John Wiley and Sons. Mandelbrot, BB. 1983, The Fractal Geometry of Nature , New York: W.H. Freeman and Company, 1983. Michalewicz, Z, 1992, “Genetic Algorithms + Data Structures = Evolution Programs” Springer-Verlag, New York, 1992. Mitchell, M, Forrest, S, & Holland, JH, 1992 “The royal road for genetic algorithms: Fitness landscapes and GA performance”, Procs. of first ECAL, Camb., MA. MIT Press.
Mitchell, M, Holland, JH, & Forrest, S, 1995 “When will a Genetic Algorithm Outperform Hill-climbing?” to appear in Advances in NIPS 6, Morgan Kaufmann, San Mateo, CA. Watson, RA, Hornby, GS, & Pollack, JB, 1998, “Modelling Building-Block Interdependency”, PPSN V, Eds. Eiben, Back, Schoenauer, Schweffel: Springer. Watson, RA, & Pollack, JB, 1999, “Incremental Commitment in Genetic Algorithms”, GECCO 99, submitted. Whitley, D, Mathias, K, Rana, S & Dzubera, J, 1995 “Building Better Test Functions”, ICGA-6, editor Eshelman, pp239-246, Morgan Kauffmann, San Francisco.