and an NK-landscape with N = 4 and K = 1 with the following bit interactions: ..... SAT/MAXSAT problems and GSAT breaks ties by randomly selecting the next ...
Test Function Generators as Embedded Landscapes Robert B. Heckendorn, Soraya Rana, Darrell Whitley Department of Computer Science Colorado State University Fort Collins, Colorado 80523 USA (heckendo,whitley,rana)@cs.colostate.edu
Abstract NK-landscapes and kSAT problems have been proposed as potential test problem domains for Genetic Algorithms. We demonstrate that GAs have diculty solving both kSAT and NK-landscape problems. The construction of random kSAT and NK-landscape problems are very similar, but the dierences between kSAT and NK-landscape generation result in vastly dierent tness landscapes. In this paper we introduce a parameterized model for the construction of test function generators. This model, called embedded landscapes, can be used to isolate the features of combinatorial optimization problems for more control during experimentation. We also show that common forms of embedded landscapes allow for a polynomial time Walsh analysis. This means we also can compose exact schema averages in polynomial time for schema up to order-K , where K is a constant. Yet, in the general case, this information does not allow one to infer the global optimum of a function unless the complexity classes P and NP are equal.
1 Introduction Performing comparative studies between search algorithms is complicated by the fact that there are few well-understood test problems available to researchers. What has often been done is to use a small test suite of parameter optimization problems to determine whether a particular algorithm performed better than other algorithms. Using this small seemingly arbitrary set of test functions provides a very narrow view of the performance of an algorithm. An alternate method is to use test function generators (De Jong and Spears 1997) that allow researchers to have a virtually limitless number of test problems that all fall within a broad class of functions. Two well known examples of classes of simple test function generators
are K-satis ability (kSAT) problems and NK-landscapes. kSAT problems and NK-landscapes can easily be made into test function generators because the problem de nitions are general and simple. Both problems have tunable parameters that allow the user to manipulate the character of the problems. Since the domain for both kSAT and NK-landscape functions is the space of binary strings, the representation issues and choice of genetic operators are simpli ed. These two domains appear, super cially, to be ideal for testing the performance and behavior of genetic algorithms. But are they? Genetic algorithms are thought to be robust search algorithms. What is often overlooked by practitioners is that standard genetic algorithms theoretically only perform well when the relationships between schemata can be eectively exploited by the algorithm. The question we pose is: Are there useful schema relationships in kSAT and NK-landscape problems that can provide genetic algorithms with an advantage over local search techniques? To answer this question, we create a general framework, called embedded landscapes, for test function generators that encompasses a broad class of combinatorial problems including both NK-landscapes and kSAT functions. We analyze the epistatic features of NK-landscapes, kSAT problems and embedded landscapes in general. Our analytical results show that, in case of NK-landscapes and kSAT problems, there is little or no schema information. We provide empirical results for several kinds of search algorithms, including a Simple Genetic Algorithm, to support our analytical results.
2 Two Classic Optimization Problems and a Generalization In order to de ne the problems, we rst de ne two functions. Let bc(i) be a bit count or unitation function which returns the number of 1's in i. Also, let pack be a function pack : B LB L!B M where M L and B = f0; 1g. pack(x; m) takes the bits in x and masks them with a L bit mask m : bc(m) = M and packs the bits selected by the mask in the result. For example: pack(10101; 01101)!011.
2.1 Two Classic Problems Two classic problems have surprisingly similar mathematical structure: kSAT problems and NK-landscapes. This led us to the generalized concept of embedded landscapes. As a test function generator, embedded landscapes provide needed control of important features of tness functions used in testing and analysis. NK-landscapes (Kauman 1993) are a popular experimental model for correlated landscapes (Weinberger 1990). An NK-landscape is a function f : BN !IR where K is the number of bits in the string that epistatically interact with each bit. An NK-landscape can be expressed as an average of N functions as follows:
f (x) = N1
NX ?1 j =0
rj (pack(x; mj ))
(1)
Each rj : BK +1!IR is an evaluation function which gives a partial tness for each bit pattern formed by the value of the j th bit itself and the K bits it interacts with. Each bit may interact with a possibly dierent set of K bits. To get this behavior, N masks,
mj : bc(mj ) = K + 1, are used to select the K bits that epistatically interact with the j th bit. The j th bit itself is also selected by the mask. Therefore bc(mj ) = K + 1. K must therefore fall in the range [0; N ? 1]. The pack function uses the masks to select the interacting bits and generate the arguments for the interaction functions rj . One of the nice features of NK-landscapes is that K acts as a tunable ruggedness control. When K = 0, a bitwise linear function is generated. The resulting landscape is the average of the weights associated with each bit and hence, assuming a Hamming neighborhood, is highly correlated and relatively smooth. When K = N ? 1 the function is random and uncorrelated. Note that NK-landscapes are technically not landscapes but rather functions since no neighborhood description has been speci ed for the domain elements (Jones 1995). The second classic type of problem is based on the k-satis ability or kSAT problem. kSAT problems are a commonly used testbed for search algorithms (Hogg et al. 1996, De Jong and Spears 1997). These problems are logical expressions in conjunctive normal form (CNF) with k variables (some of which may be negated) in each disjunctive clause. A problem is satis able if there exists an assignment of truth values to the variables that will make the logical expression true. When stochastic search algorithms are applied to kSAT problems they need a gradient to climb which is not provided by a Boolean function. So rather than using logical ANDing to combine disjunctive clauses, the Boolean evaluations associated with the individual clauses are summed with true being 1 and false being 0. The maximum is then sought to solve the problem. This evaluation function is called MAXSAT (Papadimitriou, 1994). If C is the number of disjunctive clauses in the CNF, the MAXSAT function, f : BN !IR is:
f (x) =
CX ?1 j =0
cj (pack(x; mj ))
(2)
where x 2 BN represents the true/false assignment to each of the N Boolean variables in the problem. Each cj represents a Boolean evaluation function applied to the disjunctive clause that uses the k variables selected by mask mj : bc(mj ) = k using the pack function. In this case cj : Bk !B.
2.2 Embedded Landscapes as a Generalization Embedded landscapes are a useful generalization of these two classes of functions. An embedded landscape, f : BN !IR, can be expressed as the sum of P embedded functions: f (x) =
PX ?1 j =0
gj (pack(x; mj ))
(3)
There are no restrictions on the number of functions, P , or the number of 1 bits in each interaction mask mj 2 BN or the values returned by the interaction functions gj : Bbc(mj ) !IR. The term embedded landscape comes from the idea that the gj are often of lower dimension than N and are embedded in the higher dimensional space via the use of the pack function.
Embedded landscapes allow us to independently control several important aspects of a function. By adjusting the value P , we control the number of subfunctions de ned over dierent partitions of the search space. We can, for example, study functions with a variety of densities and distributions of subfunctions. With mj , we can control the exact overlap between subfunction domains. For example, this could be used to study interaction and competitions between dierent partitions of the search space during genetic search. The subfunctions, gj , control the ordering by tness of hyperplanes in each partition de ned by mj . They also allow us to compare functions constructed out of simple basic classes of smaller subfunctions such as at (constant), random, linear and unimodal functions. Although all of this can be done without invoking embedded landscapes, the model provides a ready-made conceptual and mathematical model with several useful orthogonal \knobs" for adjusting function structure. It is apparent from the de nition that both NK-landscapes and MAXSAT problems are forms of embedded landscapes. Embedded landscapes also have the potential to represent many other combinatorial problems that are the sum of the eects of a limited number of interactions between objects. Let's look at an example of how a MAXSAT problem and an NK-landscape can both be expressed as an embedded landscape. Let v = [A; B; C; D] be the vector of variables and vector x 2 B4 be an assignment of Boolean values to the variables. Consider the following the MAX2SAT problem: (A _ B ) + (B _ C ) + (A _ C ) + (B _ D)
(4)
and an NK-landscape with N = 4 and K = 1 with the following bit interactions:
m0 = 1100; m1 = 0110; m2 = 1010; m3 = 0101
(5)
and associated epistatic functions r0 ; r1 ; r2 ; r3 to be described below. Both these functions can be represented as embedded landscapes. For convenience, the functions are chosen so they have the same interaction masks. The interaction functions for both problems are de ned in the same table below. As an embedded landscape, the MAX2SAT has each disjunctive clause represented as a interaction function ci and the value of each logical variable as a bit position in the argument string. Then the MAX2SAT problem SAT : B4!B is SAT(x) = c0 (pack(x; m0 )) + c1 (pack(x; m1 )) + c2 (pack(x; m2 )) + c3 (pack(x; m3 ))
(6)
The NK-landscape NK : B4!IR is NK(x) = 41 (r0 (pack(x; m0 )) + r1 (pack(x; m1 )) + r2 (pack(x; m2 )) + r3 (pack(x; m3 ))) (7) with the interaction functions as shown in the following table, where y represents the value
extracted by the pack function, pack(x; mi ).
y
00 01 10 11
NK subfunctions
r0 0:31 0:53 0:23 0:33
r1 0:41 0:58 0:84 0:83
r2 0:59 0:97 0:62 0:27
r3 0:26 0:93 0:64 0:95
2SAT subfunctions
c0 c1 c2 c3
1 0 1 1
1 0 1 1
1 1 0 1
1 1 1 0
(8)
Lee Altenberg (1994; 1996) rst introduced the idea of a generalized NK-landscape with a vector of interaction masks in the form of a matrix. His work mainly deals with random interaction functions. We independently created the same natural generalization in Heckendorn and Whitley (1997) in which we referred to the concept as NKP landscapes. With NKP landscapes, we emphasized the generality of the embedded functions. In this paper we want to further emphasize the embedding concept, so we have changed the name from NKP to embedded landscapes. We reluctantly chose to use the word \landscape" to show an association with NK-landscapes even though the connectivity of the domain is not speci ed.
3 An Analysis of Embedded Landscapes An embedded landscape is a sum of simpler lower dimensional subfunctions. This method of combining subfunctions simpli es the analysis of embedded landscapes. To understand the structure of embedded landscapes we must rst understand how each lower dimensional subfunction is embedded in the higher dimension of the landscape. The domain of a function f : BL!IR can be partitioned into nonintersecting hyperplanes. A partition is speci ed by a string with a b in the positions called xed bit positions and *'s in the remaining positions. For example, *b** represents a partition that contains the two hyperplanes *1** and *0**. A partition with M b's de nes a set of 2M nonintersecting hyperplanes, each composed of 2L?M strings whose union is all of the strings in the domain. A partition of the domain yields a hyperplane numbering. If h is any hyperplane then we can uniquely identify the hyperplane by two bitstrings. The rst indicates the partition, which selects the xed bit positions, and the second which indicates how the xed positions should be set. To make this more formal, denote the partition string as a binary string by replacing the b's by 1's and *'s by 0's. In the following, this string also corresponds to the mask m. We use hm;j , j 2 BM , m 2 BL, bc(m) = M , M L to denote an M order hyperplane in BL such that x 2 hm;j if and only if pack(x; m) = j . In this case m selects how the domain is partitioned and j selects one of the hyperplanes from the partition. The partitioning of the domain space by m can be viewed as a concatenation of 2bc(m) hyperplanes hm;j represented as vector:
hm;0 hm;1 hm;2 : : : hm;2bc m ?1 (
)
(9)
This forms a 2bc (m) 2bc (m) matrix H such that if Hi;j = x then i = pack(x; m) and j = pack(x; m), where m is the complement of m. For example, a function in B5 with a
partition of *b*bb would give an m = 01011; m = 10100 and a domain matrix of: ? pack(x; 01011) ?! 2
00000 " 6 00100 pack(x; 10100) 6 4 10000 # 10100
00001 00101 10001 10101
00010 00110 10010 10110
00011 00111 10011 10111
01000 01100 11000 11100
01001 01101 11001 11101
01010 01110 11010 11110
01011 3 01111 77 11011 5 11111
(10) In this matrix the bold bits f0; 1g are those determined by pack(x; m). All other bits have values determined by pack(x; m). Now consider a subfunction g : BM !IR from an embedded landscape in the space BL. Assume the function uses mask m to select the interaction bits. g(pack(x; m)) is now in a function in BL in which all of the elements in hyperplane hm;pack (x;m) are set to g(pack(x; m)). Arranging the function values in a matrix as we did earlier for the domain values, we get a matrix of function values with constant values for the elements in each column. ? pack(x; m) ?! 2 3 g(0) g(1) g(2) : : : g(2bc(m) ? 1) " 66 g(0) g(1) g(2) : : : g(2bc(m) ? 1) 77 (11) pack(x; m) 6 . 7 .. .. .. 4 .. 5 . . . # g(0) g(1) g(2) : : : g(2bc(m) ? 1) Note that this matrix results from embedding a single function g into f . From the matrix we can see that embedding a function using mask m induces a partitioning of the function domain and all the strings in each hyperplane of this partitioning of f are assigned a single function value as their tness. This can be expressed simply by an example. If g uses mask 0001110000, then strings in f that are members of hyperplane ***101**** all have the same evaluation; strings in hyperplane ***000**** also all have the same evaluation, etc. Thus, a plateau phenomenon results in which at regions corresponding to speci c hyperplanes, are present in the function. When multiple subfunctions are used, the embedded landscape is the sum of these subfunctions. Each mask repartitions the space and each subfunction assigns values for each induced hyperplane of the subfunction. Yet at regions can remain. Features of embedded landscapes that tend to preserve plateaus are: small masks, overlapping masks, few subfunctions, the subfunctions can contain plateaus, the subfunctions have a small set of range values shared by all. Formally, plateaus are a connected set of points in the domain that all have equal tness. By connected we mean connected in the graph theoretic sense: we assume points are nodes and two points have an edge between them if they are Hamming distance one apart. Plateaus can, in fact, be de ned for any neighborhood operator. Plateaus can cause diculty for stochastic optimization algorithms that strongly rely on moves in Hamming space. This is because for points in the interior of the plateau and many along the edges, there is no productive gradient information. One of the major uses of embedded landscapes is to create functions by combining a number of smaller and often simpler subfunctions. Embedded landscapes are suciently general that any function can be expressed as an embedded landscape. However, if P becomes large in comparison with N the interactions between the embedded functions become dicult to
study. Furthermore, if maxj=0::P ?1 (bc(mj )) approaches N then the interaction functions themselves are not much easier to study than the whole function itself.
3.1 Walsh Analysis of Embedded Landscapes Two important characteristics of embedded landscapes are limited epistasis (i.e., nonlinearity) and the plateau phenomenon. We use Walsh analysis (Bethke 1981) to study these two features. Walsh coecients are a direct measure of both the linear and nonlinear epistatic interactions of the bits in a function (Goldberg 1989b, Reeves and Wright 1995). Any function f : BL!R can be uniquely decomposed into a linear weighted sum of Walsh functions, i , where the weights are the real valued Walsh coecients, wi .
f (x) =
L?
2X1
i=0
wi i (x)
(12)
This creates an invertible linear transformation between functions expressed as vectors of 2L values of f and corresponding vectors of 2L Walsh coecients. If the ith Walsh coecient, wi , is zero then we can conclude that exactly the bit combination identi ed by the 1 bits in the index i makes no contribution to the value of the function. For example: if w7 = 0 then the three bits indicated by the index expressed in binary do not epistatically interact. Note that this does not prevent subsets and super sets of bits from interacting, that is, it may be the case that w3 ; w5 ; w6 ; and w15 are nonzero. The order of the Walsh coecient is the number of 1 bits in the index. A measure that uses Walsh coecients to quantify the maximum level of epistasis in a function is which measures the largest number of bits among which there are epistatic interactions:
(f ) = wmax bc(i) 6=0 i
(13)
It has been shown (Heckendorn and Whitley 1997) that if a single function g : BM !IR is embedded in a higher dimension space, BL, L > M , using mask m then the resulting function f : BL!IR has Walsh coecients:
g if wpack (i;m)
im (14) 0 otherwise where wg and wf are Walsh coecients for g and f respectively; i; m 2 BL; bc(m) = M ; and the notation i m means that everywhere a 1 bit occurs in i it must also occur in m. The critical observation is that even though all of the function values in f may be nonzero, only those Walsh coecients, wi , with i m can be nonzero. That is, for the Walsh transformed function f , hm;0 contains the only nonzero Walsh coecients. This leads to a second important observation: embedding a lower dimensional function, such as g above, in a higher dimensional space, as with function f above, neither increases the number of nonzero Walsh coecients nor the maximum level of epistasis. In terms of the matrix of Walsh coecients, applying the Walsh transform to the earlier wif =
matrix (eq. 11) for the embedded function g yields: 2
"
pack(x; m
#
6 6 ) 666 4
? pack(x; m) ?! w0 w1 w2 : : : w2bc m ?1 0 0 0 ::: 0 (
.. . 0 0
.. . 0 0
.. . 0 ::: 0 :::
.. . 0 0
)
3 7 7 7 7 7 5
(15)
where the wi are the Walsh coecients for g. (It is critical to note that the indices shown here are taken from g; their indices in f depend on the mask.) Notice that the only nonzero Walsh coecients have an index whose 1 bits are contained entirely in the mask m. Therefore, there are at most 2bc(m) nonzero Walsh coecients. All the other Walsh coecients are zero. This is reasonable since bits outside of mask m should have no eect on the value of the higher dimensional function. Note that for values of bc(m) L the ratio of nonzero Walsh coecients to the total number available becomes exponentially small, which strongly constrains the complexity of the function. Because an embedded landscape is a sum of subfunctions and the Walsh transform is a linear transformation, we know the ith Walsh coecient of the embedded landscape f with subfunctions gj can be computed by:
wif =
PX ?1 j =0
wigj
(16)
We use this to illustrate what happens as the number of embedded functions increases. When P = 1 the landscape consists of the expansion of a single subfunction g by a mask m resulting in every value in hyperplane hm;x being assigned g(x) (see eq. 9). In Walsh space this makes the Walsh coecients in the hyperplane hm;0 the only possible nonzero Walsh coecients. As P increases the additive property of Walsh coecients, instilled by the linearity of the Walsh transform, means that successive nonzero hyperplanes hmb ;0 are added in the Walsh space while in function space, successive layers of constant hyperplanes hm;x are added. Plateaus are guaranteed to exist in embedded landscapes for small P and small to moderate size interaction masks. As we saw with a single embedded function (P = 1) the space is partitioned into 2bc (m ) hyperplanes, each partition being a plateau of 2bc(m ) in size. A second (P = 2) embedded function tends to redivide the space into a new set of hyperplanes cutting each hyperplane into 2bc (m ) pieces. To be precise, two subfunctions with interaction masks m1 and m2 produce at most 2bc(m _m ) plateaus each 2bc (m _m ) strings in size. Depending on the value of the subfunctions there may be fewer plateaus. This extends in the obvious manner to larger numbers of subfunctions. As P increases the overlaying of partitions tends to redivide the space into smaller and smaller plateaus exponentially by 2bc (mj ) . As we will see, this process is slowed in the case of embedded functions which themselves have plateaus. 1
1
2
1
2
1
2
3.2 NP-hardness and NP-completeness of Embedded Landscapes Although embedded landscapes can be quite limited in complexity, that doesn't make them easy. A decision language, L, is said to be NP-hard if
L0